[Development] Periodical digest from the CI (2017/05) - CI Performance, HW and SW changes

Tue May 30 09:15:15 CEST 2017

Hi all!

I was due to write you something about the CI. This time I’ll cover the topics performance as well as upcoming hardware and software changes.

Performance:

You have all noticed that the CI system is behaving poorly to what comes to the performance. Sometimes autotests take over 30 times longer to run compared to a normal situation. Now why is that?

We actually have different kinds of bottlenecks. One is the bandwidth with which the virtual machines (VMs) store local data to their virtual hard drives. The servers on which our virtual machines run have no local hard drives. They instead store all their data on a centralized storage called the Compellent (https://en.wikipedia.org/wiki/Dell_Compellent). So when a VM wants to store data on its virtual hard drive, the host it runs on actually stores the data on the centralized storage.

We have several generations of hardware installed, and they have different speeds on their SAN interface with which they are connected to the Compellent. As your build picks up a server, it can be a new rack server, or it can be an older generation Blade (https://en.wikipedia.org/wiki/Blade_server). Also, all the other VMs on these servers share the same bandwidth, so depending on what the other builds do, your SAN connection can be affected. Sadly, our test of prioritization these VMs, didn’t produce expected results and really didn’t much at all.

These generations of hardware and type of hardware also affect the amount of other VMs the hardware can run simultaneously.  We have Mac mini’s running our macOS builds. Those generally run 1 VM per physical Mac mini. Then we have old Blades that run around 4 VMs per Blade. The latest additions to our hardware pool are dual socket 20 core CPU server racks. Those run up to 26 VMs simultaneously. Running more on the same hardware reduces costs for us, but also increases the odds of one build affecting the next.

Another bottleneck is the Compellent itself. The storage system has 120 + 10 spare hard drives spinning at 15K RPM. However great the IOPS performance is in that system, when we decide to start 200+ VMs in the CI, it goes down to its knees. And when it does that, all builds and autotest run are affected. You could think of this as you having 2 computers at home sharing the same spinning disk.

Now that all is grim and morbid, let’s continue with the good news.

Upcoming hardware changes:

We’re replacing the current hardware stack with a completely new one. The parts have arrived and are being installed as I type right now. Not only did we acquire new hardware that is faster, but we also redesigned the building concepts so that they utilize the hardware differently. The new hardware can be easily expanded and we designed the system so, that we don’t produce bottlenecks even when expanding it.

Before I go in to details, I need to explain a bit how the CI systems generally works. So heading off on a tangent here! When developer stages a commit in Gerrit (codereview.qt-project.org), the CI picks it (or multiple commits) up. The CI or Coin as the piece of software is figuratively called, generates work items based on the data received. If let’s say the commits was for QtDeclarative, Coin now produces work items for itself to handle that QtDeclarative build on top of circa 30 different target platforms. Each of these work items depend on QtBase to be built. So now Coin also creates these circa 30 work items for QtBase. As QtDeclarative is built on top of the current qt5.git’s QtBase, it means in normal situations that QtBase has already been built previously. These artifacts have been stored by Coin and can now be reused. So instead of rebuilding QtBase for QtDeclarative, Coin simply checks its storage and links the previous builds to the current one and promptly continues with building QtDeclarative. This is the major change in how we build Qt nowadays compared to old days with Jenkins where every build always rebuilt the entire stack up to its point.

Continuing into more details. Whenever Coin starts to build something, it needs a VM for the task. We have “templates” in vSphere that represent different operating systems. They are virtual machines that have operating systems installed in them along with some basic things set up like user accounts and SSHD etc. Then they have been shut down ready to be used. Now when a build needs a VM, it clones a template VM and launches the clone. The clone is actually only a linked clone. This means that we don’t really clone anything, but only create a new virtual machine that links or points to the original one. Now, when the new clone is powered on, it _reads_ from the original template, but all changes are _written_ to its own file called the ‘delta image’. This way a new virtual machine only takes up space that’s equal to the amount of data it has written.

Going back to the template again. I said that it only contained basic things like user accounts and SSHD. A build surely needs more than that. We need Visual Studios, MINGW, CMake, OpenSSH, XCodes, MySQL etc installed as well. Those things are ‘provisioned’. In qt5.git we have a folder structure under /coin/provisioning that contains scripts that install these things. As there is no point in running them every time for every VM, we create yet another set of templates that contain these pre-installed. We call these TIER2 images (or templates) vs TIER1 images being the vanilla distros containing only the basic things enabling us to even use them.

TIER2 images work pretty much the same way as QtBase was a dependency for QtDeclarative. Each build we trigger checks the current configurations scripts from qt5.git and makes a SHA from the folder structure. This SHA is used in naming the TIER2 image. If the content we want to install has changed, we have to regenerate a new TIER2 image. This is called the provisioning and it’s triggered automatically if the requested TIER2 image doesn’t exist.

Now, let’s go back on track and talk about the hardware changes.

The new servers have local SSD drives that work as the storage for the VMs instead of a centralized storage. This removes the bottleneck of a SAN network and reduces latencies while at it. And while being SSD drives, they are faster by design to what the Compellent used to be with its rotating discs. We still have a Compellent, but this time it’s filled with SSD drives. While the VMs use local SSD drives on the hosts themselves to store data, reading is a more complicated thing. The TIER1 and TIER2 images described earlier are still stored centralized on the Compellent. This saves us the transferring of the images to each server serving as the host for the VMs. These TIER2 images are cloned as normal, and then the read operations point to the source. This would cause the same situation as with the old system where everything is read from the Compellent, but we are relying on caches to work in our favor here. The TIER2 images are shared via NFS, and the host OS on the server is equipped with a 500 GB NFS cache. So whenever something is read from the TIER2, it is in fact now read from the NFS cache that’s local. All this is obviously assuming that the data has been read once previously. In practice, if a TIER2 image gets updated, the data has to be read from the centralized storage once, and then it’s in cache for the rest of the builds. We also have to remember that not the entire TIER2 image is read whenever data is read. If a build requests openssh.so, only those blocks containing the file are read.

We also need the Compellent to provide us with redundancy for critical systems and a huge data storage for data that can’t be stored distributed. Critical systems include our own infrastructure and the storage is needed for all kinds of data including our release packages, install packages, distro ISO images etc. So even if we had a good mechanism to distribute the entire TIER1 and TIER2 load to the servers themselves, currently there is no need for it and the Compellent serves this need more than well right now.

The new hardware infrastructure will include new switches and firewalls as well. And all these are being set up in new premises, so everything is new. With this we will expect a few maintenance breaks during the upcoming months where services are being handed over from one site to the other. The down times should be relatively low, since all data is being transferred beforehand and not during the down times.

Software changes:

Currently Coin is using VMware’s vSphere technology to create and run VMs. That’s about to change. Our new spinal cord will be based on OpenNebula (https://opennebula.org/). The swap to this new technology will come at the same time we switch to the new facility with the new hardware. We’ve been working hard to get the robustness and reliability up, matching or even exceeding the one provided by VMware’s products. With open source non-proprietary code we can go deep into the root causes of problems and fix drivers if that’s needed to make our VMs run smoothly without hick-ups. With OpenNebula being KVM based, we can expect new distro support to be available sooner as well. No longer do we need to fall back to saying a new macOS can’t be installed because VMware doesn’t support it. Let’s hope I can hold up to this promise or claim 😉

Performance wise the comparison between VMware and OpenNebula is a bit unfair since they use different underlying hardware, but we can say that builds aren’t going to get any slower by the looks of it.

We’re also working on getting all of our distros more provision scripted. This will make it a lot easier for anyone ( yes, this includes you ) to upgrade the software that’s being run on the VMs. Anyone can access qt5.git/coin/provisioning and modify / add scripts there. Normal code review procedures apply and TIER2 images get updated.

Internally we’ve had 3 different Jenkins instances in the past. We had one for CI that got replaced by Coin a year ago. The remaining two were for release package creation, Creator builds and few others, and the second one was for RTA standing for Release Test Automation where we verified the packages to really install something and examples working etc. Those two Jenkins instances are planned to be merged with Coin at some point in time, but for the time being they are going to stay there for a while. However, we’re improving the backend of how they receive their VMs. They currently compete with Coin in getting hardware resources. In the next weeks this is going to be changed so that Coin creates these VMs. This takes away the race conditions between two back ends, but also gives our Jenkins instances “support” for OpenNebula VMs. Even if this does not show up directly to you as CI users, it should show up with slightly more reliable VM dedication, more effective cleanup of VMs, and at least from the technical perspective we should be more capable of producing packages faster.

I hope you found this interesting. If you have any questions feel free to ask in public or in private. I’ll be happy to answer if I can 😊

Have a great summer!
-Tony

Tony Sarajärvi
CI Tech Lead

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20170530/96ea6de2/attachment.html>