[Development] On the reliability of CI

Thu Oct 25 19:11:19 CEST 2012

Shawn Rutledge spaketh:
>  Personally I think the fundamental problem which CI could do better is to
> triage problems.  <snip>,
>
> I think when a test fails, the CI system should try to break down the
> patch set in some way.  For example it could divide the patch set in half,
> arbitrarily, and see if half of them will integrate successfully, then
> the other half, and continue this recursively until the one bad patch is
> found, or at least a smaller subset.
>
> <snip>, But this is a technical problem, seems like it should have
> a technical solution.  I can only imagine for example that Google
> has a better system for internal development, I just don't know what it is.

Agree with your post.  It's work, though.

On Google, I've read extensively about their internal engineering
tools (build, test, regression, distribution, etc.)  The "Google
Engineering Tools" official blog spot is:

<http://google-engtools.blogspot.com/>

There's some good stuff in there about CI systems, and it's similar to
what you propose.  Example blog post there:

<http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html>

Summary for those that don't want to RTFA:

(1) Check-ins trigger build-and-test activity before "is-found-good"

(2) While previous build-and-test activity is in-progress, new
check-ins are "queued"

(3) Build-and-test is "predictive-optimized" to build-and-test only
what is impacted by the actual changes (only run the "needed" tests,
not "all-the-tests")

(4) Upon "fail", it does the "unwind-the-grouped/queued" check-ins to
find the failed commit, and remove it (allowing the others to pass --
same as what Shawn suggested)

Google claims this system works on their C++ code base with 20+ code
changes per minute, and 50% of the files changing each month.

My conclusion:  Brilliant. But, sounds like dedicated resources
required to get the tooling in place, and to maintain it (probably
dedicated resources in an ongoing basis, we all know regression-test
maintenance can be expensive despite it being essential).

--charley