[Development] Qt CI reliability

Sean Harmer sean.harmer at kdab.com
Tue May 3 15:45:03 CEST 2016


On Monday 02 May 2016 05:05:07 Tuukka Turunen wrote:
> Hi Sean,
> 
> Firstofall, I do apologize for the inconvenience caused by the CI system. We
> are fully aware of the situation, and the effect is has for productivity.
> All the developers of The Qt Company are using exactly the same CI system.

Yes, that is a lot of people to be held up by an oft-times unreliable piece of 
infrastructure.
 
> To address the problems, we had with Jenkins based CI we started to develop
> new CI system built from the ground up to serve the needs of Qt.
> Unfortunately, it has taken us longer than we anticipated to get the new CI
> system stabile, and there are still a lot of failures caused by the CI
> itself. We are also continuously improving the test asset to make it less
> prone for errors, including identification of flaky cases and fixing and/or
> temporarily blacklisting these. 

The flaky test related failure rate has indeed improved a great deal over the 
last 12 months. Thanks to all who have helped with this. The problems we're 
suffering now seem to be more related to infrastructure issues as Jędrek has 
pointed out.

I think part of the problem is also perception. From outside of TQC, 
contributors have no visibility of the status of an integration beyond the 
gerrit "INTEGRATING" status. This turns it into a black box that can take most 
of a working day to result in a frustating failure.

Would it be possible for you to expose the view of the currently running 
integrations and their status on each node/configuration so that we can see if 
something looks like it might be broken and can approach a sysadmin?
 
> While we unfortunately do not have 24x7 sysadmins,

And this is a problem imho. The CI is a critical system that needs to be 
running 24x7 to support people in different timezones and during out of hours 
work in Europe. If I'm busy with paying project work during office hours, then I 
try to do what I can on Qt 3D out of normal hours but several times I've had 
to waste hours at weekends and evenings trying to shepherd changes through. 
This in turn then just adds to the load on the CI system during office hours 
when the system is able to integrate once again.

So in summary, I appreciate the CI is a big complex beast but it's also the 
gateway to getting contributions in and is therefore critical that it runs all 
of the time, or at least as much of the time as possible. Can you investigate 
the feasibility of putting 24x7 support in place please?

> we do have persons
> dedicated to operate the CI system, as well as support from IT as needed
> for the infrastructure. In addition, we are still putting a significant
> effort into developing the CI further and stabilizing it. And, yes, we are
> also continuously monitoring the infrastructure of the CI systems as well
> as planning how to improve it further in the future.

Well, disks filling up over a weekend shows that it isn't monitored 
continuously, or rather, not acted upon. If there isn't 24x7 support then this 
is a valid outcome of course. My argument is that such support is warranted.

Kind regards,

Sean
-- 
Dr Sean Harmer | sean.harmer at kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts



More information about the Development mailing list