[Development] On the reliability of CI

Thu Oct 25 09:57:14 CEST 2012

On Thursday, October 25, 2012 02:32:49 PM Lincoln Ramsay wrote:
> On 25/10/12 13:00, Rohan McGovern wrote:
> > True, there used to be Nokia employees reading every failure report and
> > chasing up apparently unstable tests, either trying to fix the tests, or
> > acknowledge them via bug reports and marking them insignificant.
> > Those people are gone and the test results are likely to be less stable
> > until they're replaced
> 
> This.
> 
> The QA guys in Brisbane did an awesome job that was perhaps not so
> obvious or visible to people outside of the office. Not only did they
> keep the CI system running and stable, they poked, prodded and tweaked
> the Qt product so that it could pass through the CI system quickly
> (raising bugs as appropriate when tests were broken or flaky).

+1

I'd like to express my support for what Linclon and Rohan said and also stress 
that Rohan (as well as Sergio, Janne, etc.)  is/are doing an absolutely 
outstanding job in helping to keep things going (despite management 
directives! (which I hope are a thing of the past)).

I've had the "pleasure" of helping with the nursing qt5.git integrations in 
the past weeks and I've found that most of the time failing integrations are 
the result of bugs in our code. Sometimes sloppyness, sometimes hard to find 
bugs.

I invite everyone in the community who is annoyed with the CI system to help 
improving it. It doesn't require any special network access (trust me, I don't 
have such access right now). Rohan has done a great job in making sure that as 
much information as possible is publically accessible, including extensive 
build logs and the source code of the scripts that power the entire system.

> I'm pretty sure there's someone at Digia ready to take over maintenance
> of the CI system. However, there isn't (to my knowledge) anyone ready to
> take on the task of keeping Qt in a state that can pass through the CI
> system. If nobody steps up to take on this responsibility then it'll
> fall on everyone to ensure their stuff is getting through CI.

One approach we could consider is what's called "Gardening" in WebKit: 
Introduce a roster with people on duty who can help to nurse things, help push 
things through the CI system. It's something anyone could help with, 
regardless of their employment.

Perhaps in an ideal world something like that wouldn't be needed. But I have 
the strong feeling that even if we had a super fast CI system that allowed 
build and auto-testing the majority of individual commits separately within 
say 10 minutes, even then I believe we'd still have a fair amount of those 
subtle issues where an innocent change in one module makes a test in another 
seemingly unrelated module flakey, breaking say a qt5.git integration. I'm 
afraid that with a code base of the size we're presented with we have to 
accept a certain amount of this.

However that should in no way stop us from investing time in improving the CI 
system as is, i.e. trying to make it faster and more reliable. I hope that the 
transition to Jenkins will make it easier to develop the system itself from 
the "outside", i.e. add experimental nodes to try out new approaches of 
incremental builds, etc.

It might even be a fun "hackathon" for the next Qt contributor's summit.

Simon