[Development] Using SSE/NEON in Qt 6
Thiago Macieira
thiago.macieira at intel.com
Sat Feb 8 05:00:31 CET 2020
On Thursday, 6 February 2020 03:45:51 PST Lars Knoll wrote:
> Hi,
>
> We’ve seen that in a couple of places things like matrix operations are a
> CPU bottleneck. Being able to provide SSE/NEON optimised versions of some
> of those operations could help significantly.
> On x86/x64, we require SSE2 already anyway, so we should be able to use
> those unconditionally. On ARM, we can make this a compile time option with
> a C implementation as the fallback.
x86-64 instruction set mandates SSE2 and it's used anyway because it's the ABI
way of doing floating point operations. So SSE2 should be no brainer on 64-
bit.
The question is what to do for 32-bit. I don't think it's very difficult,
though: if we have the SSE2 code in the header, then it's checked by
#ifdef __SSE2__
and there'll be a fallback path anyway. That fallback path will be the non-
SSE2 i386 code path.
Then there's the question of how to package this. I suggest that we enable
SSE2 by default on the Windows 32-bit build, if we still have a 32-bit build.
We don't ship 32-bit Mac or Linux builds. For Linux distributions, this is
also already solved: build twice and place one lib in /usr/lib and the other
in /usr/lib/sse2. GNU libc knows how to load this.
> One problem is, that we can only get full benefit out of those if we can
> offer them inline. That would basically imply making our qsimd_p.h header
> public and including that one from qvectornd.h and qmatrixnxn.h (so that we
> can implement the operations using the SSE/NEON intrinsics). If we do that,
> we could e.g. implement QVector4D holding a __m128 value (and the neon
> equivalent on ARM).
We should simplify qsimd_p.h. There's some compatibility code there for GCC
pre-4.9, which we don't support any more in 5.14.
We need to decide whether we want runtime detection in inline headers. If we
do, then we need an API for that and qCpuHasFeature hasn't maintained ABI.
Moreover, we'll get requests for features that we don't currently need/use.
If we don't need runtime detection, then just #include <immiintrin.h> or a
wrapper header to deal with MSVC not defining __SSE2__. qfloat16.h does that.
We also need to decide whether we want the macro normalisation that qsimd_p.h
does should be public too. qsimd_p.h defines the GCC/Clang macros that ICC and
MSVC don't (__F16C__, __PCLMUL__, __FMA__, __LZCNT__, etc.).
> I personally don’t think including qsimd.h (and implicitly immintrin.h) from
> our public headers would be a problem, but I’d be happy to hear arguments
> for/against it.
immintrin.h shouldn't be. qsimd_p.h only after clean up and especially nailing
the qCpuHasFeataure() API.
> As a side note: SSE 4.1 offers some nice additional instructions that would
> simplify some of the operations. Should we keep the minimum requirement for
> SSE at version 2, or can we raise it to 4.1?
I don't recommend it. The gain compared to SSE2 is not enough and it brings
headaches. Instead, write code paths that use __AVX2__. Lots of modern CPUs
support it and they can be co-installed with the main version on Linux (like
Clear Linux does and I've done for my openSUSE package) and on Macs.
It doesn't need to be #ifdef __AVX2__. Code should use the minimum feature
that they support, like qhash.cpp and qstring.cpp do. That has the following
benefits:
1) Mac 64-bit and Clear Linux have SSE4.2 as a baseline, so they'll be enabled
even for the base versioni
2) People building from sources with -march=native (Gentoo users?) on older
but pre-AVX2 machines will get benefit too
3) For matrix multiplication, I bet that __FMA__ is actually a much more
important gain than anything else and it comes with the same architecture as
__AVX2__. Use qCpuHasFeature(ArchHaswell)
I recommend the 64-bit Linux and Mac binaries include AVX2 versions for the
libraries that most benefit from it (QtCore, QtGui and Qt3D).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development
mailing list