[Development] Using SSE/NEON in Qt 6

Thiago Macieira thiago.macieira at intel.com
Sat Feb 8 05:00:31 CET 2020


On Thursday, 6 February 2020 03:45:51 PST Lars Knoll wrote:
> Hi,
> 
> We’ve seen that in a couple of places things like matrix operations are a
> CPU bottleneck. Being able to provide SSE/NEON optimised versions of some
> of those operations could help significantly. 
 
> On x86/x64, we require SSE2 already anyway, so we should be able to use
> those unconditionally. On ARM, we can make this a compile time option with
> a C implementation as the fallback.

x86-64 instruction set mandates SSE2 and it's used anyway because it's the ABI 
way of doing floating point operations. So SSE2 should be no brainer on 64-
bit.

The question is what to do for 32-bit. I don't think it's very difficult, 
though: if we have the SSE2 code in the header, then it's checked by 
	#ifdef __SSE2__
and there'll be a fallback path anyway. That fallback path will be the non-
SSE2 i386 code path.

Then there's the question of how to package this. I suggest that we enable 
SSE2 by default on the Windows 32-bit build, if we still have a 32-bit build. 
We don't ship 32-bit Mac or Linux builds. For Linux distributions, this is 
also already solved: build twice and place one lib in /usr/lib and the other 
in /usr/lib/sse2. GNU libc knows how to load this.
 
> One problem is, that we can only get full benefit out of those if we can
> offer them inline. That would basically imply making our qsimd_p.h header
> public and including that one from qvectornd.h and qmatrixnxn.h (so that we
> can implement the operations using the SSE/NEON intrinsics). If we do that,
> we could e.g. implement QVector4D holding a __m128 value (and the neon
> equivalent on ARM).

We should simplify qsimd_p.h. There's some compatibility code there for GCC 
pre-4.9, which we don't support any more in 5.14.

We need to decide whether we want runtime detection in inline headers. If we 
do, then we need an API for that and qCpuHasFeature hasn't maintained ABI. 
Moreover, we'll get requests for features that we don't currently need/use.

If we don't need runtime detection, then just #include <immiintrin.h> or a 
wrapper header to deal with MSVC not defining __SSE2__. qfloat16.h does that.

We also need to decide whether we want the macro normalisation that qsimd_p.h 
does should be public too. qsimd_p.h defines the GCC/Clang macros that ICC and 
MSVC don't (__F16C__, __PCLMUL__, __FMA__, __LZCNT__, etc.).
 
> I personally don’t think including qsimd.h (and implicitly immintrin.h) from
> our public headers would be a problem, but I’d be happy to hear arguments
> for/against it.

immintrin.h shouldn't be. qsimd_p.h only after clean up and especially nailing 
the qCpuHasFeataure() API.

> As a side note: SSE 4.1 offers some nice additional instructions that would
> simplify some of the operations. Should we keep the minimum requirement for
> SSE at version 2, or can we raise it to 4.1?

I don't recommend it. The gain compared to SSE2 is not enough and it brings 
headaches. Instead, write code paths that use __AVX2__. Lots of modern CPUs 
support it and they can be co-installed with the main version on Linux (like 
Clear Linux does and I've done for my openSUSE package) and on Macs.

It doesn't need to be #ifdef __AVX2__. Code should use the minimum feature 
that they support, like qhash.cpp and qstring.cpp do. That has the following 
benefits:

1) Mac 64-bit and Clear Linux have SSE4.2 as a baseline, so they'll be enabled 
even for the base versioni

2) People building from sources with -march=native (Gentoo users?) on older 
but pre-AVX2 machines will get benefit too

3) For matrix multiplication, I bet that __FMA__ is actually a much more 
important gain than anything else and it comes with the same architecture as 
__AVX2__. Use qCpuHasFeature(ArchHaswell)

I recommend the 64-bit Linux and Mac binaries include AVX2 versions for the 
libraries that most benefit from it (QtCore, QtGui and Qt3D).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products





More information about the Development mailing list