[Development] Updating x86 SIMD support in Qt
Thiago Macieira
thiago.macieira at intel.com
Wed Jan 19 18:17:22 CET 2022
On Wednesday, 19 January 2022 04:23:22 PST Allan Sandfeld Jensen wrote:
> On Mittwoch, 19. Januar 2022 04:01:06 CET Thiago Macieira wrote:
> > 5) for glibc-based Linux, add v3 sub-arch by default
> >
> > I'd like to raise the default on Linux from baseline to v2 *and* add a v3
> > sub- arch build, as described by point #3 above.
> >
> > Device-specific Qt builds (Yocto Project, Boot2Qt) would need to turn this
> > off and select a single architecture, if they don't want the extra files.
>
> I am also sceptical what we would gain from that. It seems mostly like
> something that could benefit QtCore, so perhaps only do v3 sub-arch there?
That is what I'm proposing: do the extra, v3 sub-arch for only a few select
libraries. Off the top of my head, that's QtCore, QtGui and some Qt 3D ones.
Anything that is math-heavy would benefit
> Do Clear Linux have numbers for what they have gained by using a v3 like
> default?
v3 is not the default, v2 is. But we do build qtbase and qt3d as v2+v3, plus
quite a few more packages (and a handful are v2+v3+v4). As I mentioned in the
reply to Lars, just search for benchmarks on phoronix.com.
The benefit isn't a lot in the general case. It's a few percent here and
there, but it's consistent.
You get far more with dedicated algorithms, which is why I am optimising
qstring.cpp for AVX512VL. Yesterday, while debugging QTBUG-91739, I got the
opportunity to benchmark ucstrncmp (QtPrivate::compareStrings). See [1] for
details, but it showed a 20% improvement on v3 over v2 and an additional 10.7%
on v4 over v3 (for 28.8% gain overall). Please note that the v2 and v3 code
are already using the new epilogue code I added in [2] plus all the cross-
platform optimisations listed in that JIRA comment, so this is all already
better than what you have.
However, note that all strings had the same 90-byte (45-character) length, so
this is not very representative of the real world. In particular, the AVX512VL
code should show an extra improvement for strings under 16 characters and this
wasn't exercised.
[1] https://bugreports.qt.io/browse/QTBUG-91739?
focusedCommentId=641671#comment-641671
[2] https://codereview.qt-project.org/c/qt/qtbase/+/390698
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
More information about the Development
mailing list