[Development] Using SSE/NEON in Qt 6

Thu Feb 6 15:15:38 CET 2020

On 20/02/06 02:00, Lars Knoll wrote:
> > On 6 Feb 2020, at 14:36, Lisandro Damián Nicanor Pérez Meyer <perezmeyer at gmail.com> wrote:
> > 
> > On 20/02/06 11:45, Lars Knoll wrote:
> >> Hi,
> >> 
> >> We’ve seen that in a couple of places things like matrix operations are a CPU bottleneck. Being able to provide SSE/NEON optimised versions of some of those operations could help significantly. 
> >> 
> >> On x86/x64, we require SSE2 already anyway, so we should be able to use those unconditionally. On ARM, we can make this a compile time option with a C implementation as the fallback.
> >> 
> >> One problem is, that we can only get full benefit out of those if we can offer them inline. That would basically imply making our qsimd_p.h header public and including that one from qvectornd.h and qmatrixnxn.h (so that we can implement the operations using the SSE/NEON intrinsics). If we do that, we could e.g. implement QVector4D holding a __m128 value (and the neon equivalent on ARM).
> >> 
> >> I personally don’t think including qsimd.h (and implicitly immintrin.h) from our public headers would be a problem, but I’d be happy to hear arguments for/against it.
> > 
> > That might work as long as it's compile-time optional. Let's split both cases here:
> 
> Well, the idea would be that you could do a build without any SIMD instructions if you configure Qt that way. But for the use cases I have in mind, doing some runtime detection and different code paths would probably kill most of the benefit.
> > 
> > # SSEn
> > 
> > SSEn is not present on all architectures. Not all i386 machines support SSE2 for
> > example, and some amd64 do not support more than SSE2 (read below).
> 
> Correct, but CPUs not supporting SSE2 are by now at least 15 years old.

Well, believe it or not last year there where still industrial 32 bits CPUs
being made without SSE2. That of course does not means they are dissapearing.

> > If some of this becomes mandatory then distributions will certainly not be able
> > to ship Qt 6. On the other hand if it can be decided at built time we could do a
> > double build and ship a non-optimized library in /usr/lib/ and an optimized
> > version in /usr/lib/sse2, /usr/lib/sse4, etc., as the linker knows what to do in
> > those cases.
> 
> SSE2 should not really be a problem, as it’s available on all 64bit capable CPUs.
> > 
> > At least in Debian we do this for qtbase on i386, and have different versions of
> > corelib and gui (the only ones which where directly affected by this).
> 
> Are you also doing this for QtQml? Because we completely disable the QML JIT if the platform doesn’t support SSE2.

Actually no, thanks for the pointer, I have just filed a bug so we can check
this.

> > Of course this might not go well with inlining.
> > 
> > # NEON
> > 
> > On Debian we have arm64, armel and armhf as arm-based supported architectures.
> > The only arch that can support NEON is arm64. It never existed for armel and
> > NEON was optional for µP builders on armhf.
> > 
> > I don't know if one ould do the /usr/lib/neon/ linker trick here.
> > 
> > # If we go the linker path route...
> > 
> > If this case is taken then it would be *awesome* to know exactly which libraries
> > do really get a benefit from this, so we only ship those with a double build.
> > 
> > # Other solutions?
> > 
> > If some other solution like the linker path is possible we can definitely
> > discuss it :-)
> 
> You’re looking at this very much from the perspective of a Linux distributor (understandably). Many of our users however specifically build Qt and their application for a certain hardware. So a compile time detection and usage of NEON instructions should not be a problem at all there.

Of course not!

> 
> If you want a generic build that works everywhere, the only choice is of course to turn them off for things that are inline.

A specific switch for inline stuff will certainly do it here.

> > 
> >> As a side note: SSE 4.1 offers some nice additional instructions that would simplify some of the operations. Should we keep the minimum requirement for SSE at version 2, or can we raise it to 4.1?
> > 
> > Well, I'm currently running KDE on my machine which only supports SSE2. 10yo
> > machine,: yes. But I could not afford a new one so far (and this one still works
> > pretty fine).
> 
> Ok, I guess that’s a vote against requiring anything more than SSE2 :)

:-)

> > # On a related note
> > 
> > So far my impression as a distro maintainer is that people where way more eager
> > to switch from qt4 to qt5 than from qt5 to qt6. If somehow the barrier gets even
> > higher we will have a hard time in making people to do the switch. Mind you, we
> > removed qt4 from Debian testing just some weeks ago...
> 
> Let’s see, once we have Qt 6 done. And I don’t intend to make this switch harder. There will be a C fallback code path no matter what :)
> 
> I was for now mostly looking for pros/cons of including <immintrin.h> and <arm_neon.h> in our public headers.

ACK, and I'm glad you brought this forward. There are clearly many use cases
around, and if we can somehow have a good alternative for most of them, the
better.

Regards, Lisandro.