[Development] Updating x86 SIMD support in Qt

Lars Knoll lars.knoll at qt.io
Wed Jan 19 09:13:32 CET 2022


Hi Thiago,

I’m absolutely in favour of upping the SIMD support in Qt. Compilers support everything we need, and we should make better use of that.

The main thing I’m wondering about is how much performance we gain by from a multi arch build Qt for different x86_64 architectures opposed to building maybe for v2 and detecting/using AVX and AVX512 at runtime. I would assume it’s little we gain, as there are very few places where the compilers auto-vectorizer will be able to emit AVX instructions, but I might be wrong here.

AVX is only used by a couple of classes in Qt Core and the drawhelper in Qt Gui. Qt Gui already does runtime detection, so it would be only about adding that to the methods in Qt Core. 

Couple more comments inline below.

> On 19 Jan 2022, at 04:01, Thiago Macieira <thiago.macieira at intel.com> wrote:
> 
> For Qt 6.4, I'd like to propose we change the way we detect and enable SIMD 
> support. TL;DR:
> 
> * Assume all compilers support 5-year-old stuff
> * Up the minimum CPU for Linux, Window and macOS/x86
> * Fix macOS Universal builds to use the minimum
> * Add an option to cmake to choose a minimum matching one of the Linux x86-64 
>   ABI revisions
>   * Make it easy to build QtCore, QtGui ad Qt3D multi-arch on Linux
> 
> Long version:
> 
> 1) assume all compilers support what we need
> 
> Our current tests for compiler support go all the way back to SSE2, which is 
> mandatory on x96-64. While testing some changes, I've confirmed that all 
> compilers in the CI support x86 CPU features matching the Intel Cannon Lake 
> architecture, which is more than we need, except for the QCC compiler missing 
> one intrinsic that we can workaround.
> 
> I've also found that macOS universal builds, WASM, Android and maybe some more 
> are improperly detecting support. Specifically for universal builds, what we 
> detect depends on the order in which you specify the architectures. This is 
> buggy at a minimum, surprising at best.
> 
> I propose we remove the tests for the intrinsics of each individual CPU 
> feature. Instead, let's just assume they all have everything up to 2016. This 
> will shorten cmake time a little and fix the macOS universal builds. It'll 
> also change how 32-bit non-SSE2 builds are selected (see below).
> 
> The change https://codereview.qt-project.org/c/qt/qtbase/+/386738 is going in 
> this direction but retains a test (all or nothing). I'm proposing now we 
> remove the test completely and just assume.

I’m fine with that, I don’t think we need to support a compiler that doesn’t support those. 

Can we at the same time do the same thing for NEON btw. While there are some platforms that don’t support NEON, I believe all compilers do support them.

> Question:
> - the QT_COMPILER_SUPPORTS_xxx macros are in qconfig.h (public config). Do we 
>  keep compatibility? We can easily just move them to qprocessordetection.

These are also to some extent used to differentiate between SSE and NEON. I think we can hardcode those in qprocessordetection for source compatibility.
> 
> 2) add options to select the target architecture revision
> 
> Linux established 3 new revisions of the architecture:
> * x86-64 v1 (baseline): SSE2 support
> * x86-64 v2: baseline + SSE3, SSSE3, SSE 4
> * x86-64 v3: v2 + AVX + AVX2 + FMA + BMI + F16C
> * x86-64 v4: v3 + AVX512F + BW + DQ + VL + ER
> 
> For i386, we can consider a "v0" of the non-SSE2 original baseline from the 
> 1980s.

Fine for me. I don’t really care that much about i386, as it’s quickly dying out and we’re not providing any binaries for it anymore.
> 
> I propose adding a CMake option to make it easy to opt in to one of those. 
> Yes, you can just set CMAKE_C(XX)FLAGS_{RELEASE,DEBUG,RELWITHDEBINFO}, so this  
> part would be convenience.
> 
> For the default, see #4.
> 
> 3) add a way to have multi-arch glibc-based Linux builds
> 
> The revisions also match subdirectory searches by the Linux dynamic linker. 
> The subdirectories"x86-64-v2", "x86-64-v3" and "x86-64-v4" are new in glibc 
> 2.33, but glibc has supported "haswell" (for v3) and "avx512_1" (for v4) for a 
> number of years prior to that.
> 
> The proposal is to allow the user to specify more than one architecture in the 
> list above. We can query the dynamic linker to find out if it supports the new 
> names and, if not, use the old ones.
> 
> For example, if I specified QT_X86_SUBARCH="v2;v3;v4", it would compile QtCore 
> three times. The build products would be:
>  lib/libQt6Core.so.6.4.0
>  lib/haswell/libQt6Core.so.6.4.0	OR
> 	lib/glibc-hwcaps/x86-64-v3/libQt6Core.so.6.4.0
>  lib/haswell/avx512_1/libQt6Core.so.6.4.0	OR
> 	lib/glibc-hwcaps/x86-64-v4/libQt6Core.so.6.4.0
> with their matching symlinks.
> 
> This would apply to only a few select libraries. I'm thinking QtCore, QtGui, 
> QtQml and some of the Qt3D libraries.
> 
> I don't currently see a need to do this for any plugins and there is no 
> standardised way to name them anyway.
> 
> This would replace the current "-mno-sse2" option that is required to turn 
> i386 32-bit builds from SSE2 support back to the original baseline. For a 32-
> bit build, one would use QT_x86_SUBARCH="v0;v1" and get both baseline and the 
> SSE2-optimised version.

See my comment above. We also need to think about non Linux platforms. Multi-arch is difficult on Windows as far as I know, so a v2 baseline build and runtime detection might be preferable.

> 
> 4) up the defaults from where they are today
> 
> Today, your default Qt build will always target the x86-64 baseline[*], 
> including for i386, despite as I said no CPU failing to meet the next level 
> for 9 years. I'd like to request we up that minimum.
> 
> By default, I'd like us to produce x86-64 v2 code, which is SSE4. There are a 
> number of optimisations in QtCore and QtGui that get automatically enabled. In 
> particular, qstring.cpp does not do runtime detection, so you've been leaving 
> performance on the table on your computers, unless you build Qt from source 
> yourself and set -march= to match your CPU.

I think we can probably go for v2 as the baseline for our binaries. If you need something that runs on older hardware, you’d then have to do your own build.
> 
> I'm told that Red Hat 9 will increase their minimum to v2, which is why the 
> architecture selection features now exist.
> 
> This would apply to source and binary builds from qt.io. Android and macOS 
> would be unaffected because they already default to this level.
> 
> Question:
> - iOS simulator builds are x86, but currently only SSE2. Does anyone know if 
> raising to SSE4, which *ALL*  64-bit Mac machines support, would be a problem?
> 
> 5) for glibc-based Linux, add v3 sub-arch by default
> 
> I'd like to raise the default on Linux from baseline to v2 *and* add a v3 sub-
> arch build, as described by point #3 above.
> 
> Device-specific Qt builds (Yocto Project, Boot2Qt) would need to turn this off 
> and select a single architecture, if they don't want the extra files.

This complicates the build system and deployment in quite a few places and is a Linux specific solution. Can you give some numbers how much of an improvement this would give over runtime detection where we have AVX optimised code?
> 
> 6) for macOS, raise the minimum to v3 (x86_64h)
> 
> macOS has supported an extra architecture called "x86_64h" for some time (the 
> "h" stands for "haswell"). Apple ceased offering macOS updates to processors 
> without AVX2 back with the Mojave release (10.14) in 2018. Since that's the 
> minimum version we require for Qt, it means all Intel-based Macs Qt can run on 
> also support this sub-arch.
> 
> I'd like to do this for all libraries and by default on binaries from qt.io. 
> However, I understand the ARM translation application cannot deal with the AVX 
> instructions, so it would fail to run our default binaries for the 
> applications that couldn't rebuild as ARM. Is it acceptable to require those 
> application developers to rebuild Qt from source?

We do have multi arch builds that include ARM nowadays, so maybe that’s not a huge issue. But again, if we relied on runtime detection we wouldn’t have this problem in the first place.

To sum it up, I believe it’s probably ok to raise the default build config to v2, but I do wonder how much we gain from a multi-arch build as opposed to doing runtime detection, esp. given that it complicates our build system.

Cheers,
Lars

> 
> If not, I'd like to ask we build the same libraries as enabled for Linux 
> multi-arch with the additional "x86_64h" architecture (that is, triple 
> universal build: "x86_64;x86_64h;aarch64"). I'm assuming here that we can use 
> CMake's built-in support for macOS universal builds.
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel DPG Cloud Engineering
> 
> 
> 
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> https://lists.qt-project.org/listinfo/development



More information about the Development mailing list