[Development] Sub-arch optimisations (was: How qAsConst and qExchange lead to qNN)
Thiago Macieira
thiago.macieira at intel.com
Mon Nov 21 03:38:08 CET 2022
On Thursday, 17 November 2022 10:56:22 PST Thiago Macieira wrote:
> The algorithms available are:
> * baseline SSE2: no comparisons
I realised yesterday that, since there will be no benchmarking to prove that
the new SSE2 code is better than the old one, it is by definition ready. So
I've rebased, reordered the SSE2 portion only and pushed.
The changes satrt at https://codereview.qt-project.org/c/qt/qtbase/+/386952
("QString: replace #if with if constexpr...") and ending at
https://codereview.qt-project.org/c/qt/qtbase/+/386952 ("
QString::toLatin1: do the same as..."). The first six commits are merely clean-
ups and reorganisation.
I'll defer the AVX2 and AVX512VL improvements for 6.6.
Meanwhile, I did make some progress on upping our default minimum sub-arch
targets. For the long discussion, see the thread at
https://lists.qt-project.org/pipermail/development/2022-March/042320.html
But the short story is:
* On all x86-64 builds, the new default will be the v2 sub-architecture, which
is this month 14 years old, and is the minimum on all x86-64 Android and Macs
anyway, and is the new minimum on Red Hat 9. This can be overridden up or down
by the user with the new QT_BUILD_SUBARCH variable.
* On Macs, the new default will be the v3 sub-architecture (Apple calls it
"x86-64h") and can similarly be overridden with either that variable or the
CMAKE_OSX_ARCHITECTURES variable. It should be possible to extend my code to
do both x86-64 and x86-64h multiarch on macOS, but I don't plan on spending
time on this, because ALL currently supported Macs can run AVX2.
* On Linux, we gain the ability to create multi-arch builds of modules when
compiled to shared libraries. The default on x86-64 will be to build the v2
and v3 sub-architectures. The CMake variable again allows you to add v1 and
v4, though v1 + v2 only works with glibc 2.33 (Feb 2021) and up. All other
combinations work since 2.28 (Feb 2018)
* The option can be controlled per module, so Linux distributors could choose
to do a dual-, triple-, or (in Debian's case) quadruple-arch build of qtbase,
qtdeclarative and qt3d, but not the other modules.
I've just finished a qtbase build on Linux with two sub-architectures and the
symbol comparison of all the resulting libraries has shown zero difference.
Tomorrow I will test all other modules (except qtwebengine). The code is ugly,
so I'd appreciate guidance from the CMake experts. I've already submitted a
few preliminary clean-ups.
I only implemented multi-arch for modules when compiled as shared libraries.
There's currently no solution for multi-arch binaries on Linux[*], so there's
no sense in making that solution work for modules as static libraries right
now. I might revisit this for non-module static libraries. QPluginLoader can
load multi-arch plugins, but right now they're not worth it; they can do like
the qxcb plugin did and move its functionality onto a library.
[*] I had an idea an hour ago, thinking about the qxcb plugin and remembered
the old KDE Brockenbores solution.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Cloud Software Architect - Intel DCAI Cloud Engineering
More information about the Development
mailing list