[Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)

Thiago Macieira thiago.macieira at intel.com
Mon Nov 21 03:38:08 CET 2022


On Thursday, 17 November 2022 10:56:22 PST Thiago Macieira wrote:
> The algorithms available are:
> * baseline SSE2: no comparisons

I realised yesterday that, since there will be no benchmarking to prove that 
the new SSE2 code is better than the old one, it is by definition ready. So 
I've rebased, reordered the SSE2 portion only and pushed.

The changes satrt at https://codereview.qt-project.org/c/qt/qtbase/+/386952 
("QString: replace #if with if constexpr...") and ending at 
https://codereview.qt-project.org/c/qt/qtbase/+/386952 ("
QString::toLatin1: do the same as..."). The first six commits are merely clean-
ups and reorganisation.

I'll defer the AVX2 and AVX512VL improvements for 6.6.

Meanwhile, I did make some progress on upping our default minimum sub-arch 
targets. For the long discussion, see the thread at
https://lists.qt-project.org/pipermail/development/2022-March/042320.html

But the short story is:
* On all x86-64 builds, the new default will be the v2 sub-architecture, which 
is this month 14 years old, and is the minimum on all x86-64 Android and Macs 
anyway, and is the new minimum on Red Hat 9. This can be overridden up or down 
by the user with the new QT_BUILD_SUBARCH variable.

* On Macs, the new default will be the v3 sub-architecture (Apple calls it 
"x86-64h") and can similarly be overridden with either that variable or the 
CMAKE_OSX_ARCHITECTURES variable. It should be possible to extend my code to 
do both x86-64 and x86-64h multiarch on macOS, but I don't plan on spending 
time on this, because ALL currently supported Macs can run AVX2.

* On Linux, we gain the ability to create multi-arch builds of modules when 
compiled to shared libraries. The default on x86-64 will be to build the v2 
and v3 sub-architectures. The CMake variable again allows you to add v1 and 
v4, though v1 + v2 only works with glibc 2.33 (Feb 2021) and up. All other 
combinations work since 2.28 (Feb 2018)

* The option can be controlled per module, so Linux distributors could choose 
to do a dual-, triple-, or (in Debian's case) quadruple-arch build of qtbase, 
qtdeclarative and qt3d, but not the other modules.

I've just finished a qtbase build on Linux with two sub-architectures and the 
symbol comparison of all the resulting libraries has shown zero difference. 
Tomorrow I will test all other modules (except qtwebengine). The code is ugly, 
so I'd appreciate guidance from the CMake experts. I've already submitted a 
few preliminary clean-ups.

I only implemented multi-arch for modules when compiled as shared libraries. 
There's currently no solution for multi-arch binaries on Linux[*], so there's 
no sense in making that solution work for modules as static libraries right 
now. I might revisit this for non-module static libraries. QPluginLoader can 
load multi-arch plugins, but right now they're not worth it; they can do like 
the qxcb plugin did and move its functionality onto a library.

[*] I had an idea an hour ago, thinking about the qxcb plugin and remembered 
the old KDE Brockenbores solution.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering





More information about the Development mailing list