[Development] Updating x86 SIMD support in Qt

Thiago Macieira thiago.macieira at intel.com
Wed Jan 19 18:17:22 CET 2022


On Wednesday, 19 January 2022 04:23:22 PST Allan Sandfeld Jensen wrote:
> On Mittwoch, 19. Januar 2022 04:01:06 CET Thiago Macieira wrote:
> > 5) for glibc-based Linux, add v3 sub-arch by default
> > 
> > I'd like to raise the default on Linux from baseline to v2 *and* add a v3
> > sub- arch build, as described by point #3 above.
> > 
> > Device-specific Qt builds (Yocto Project, Boot2Qt) would need to turn this
> > off and select a single architecture, if they don't want the extra files.
> 
> I am also sceptical what we would gain from that. It seems mostly like
> something that could benefit QtCore, so perhaps only do v3 sub-arch  there?

That is what I'm proposing: do the extra, v3 sub-arch for only a few select 
libraries. Off the top of my head, that's QtCore, QtGui and some Qt 3D ones. 
Anything that is math-heavy would benefit

> Do Clear Linux have numbers for what they have gained by using a v3 like
> default?

v3 is not the default, v2 is. But we do build qtbase and qt3d as v2+v3, plus 
quite a few more packages (and a handful are v2+v3+v4). As I mentioned in the 
reply to Lars, just search for benchmarks on phoronix.com.

The benefit isn't a lot in the general case. It's a few percent here and 
there, but it's consistent.

You get far more with dedicated algorithms, which is why I am optimising 
qstring.cpp for AVX512VL. Yesterday, while debugging QTBUG-91739, I got the 
opportunity to benchmark ucstrncmp (QtPrivate::compareStrings). See [1] for 
details, but it showed a 20% improvement on v3 over v2 and an additional 10.7% 
on v4 over v3 (for 28.8% gain overall). Please note that the v2 and v3 code 
are already using the new epilogue code I added in [2] plus all the cross-
platform optimisations listed in that JIRA comment, so this is all already 
better than what you have.

However, note that all strings had the same 90-byte (45-character) length, so 
this is not very representative of the real world. In particular, the AVX512VL 
code should show an extra improvement for strings under 16 characters and this 
wasn't exercised.

[1] https://bugreports.qt.io/browse/QTBUG-91739?
focusedCommentId=641671#comment-641671
[2] https://codereview.qt-project.org/c/qt/qtbase/+/390698
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering





More information about the Development mailing list