[Development] Using SSE/NEON in Qt 6

Thu Feb 6 12:45:51 CET 2020

Hi,

We’ve seen that in a couple of places things like matrix operations are a CPU bottleneck. Being able to provide SSE/NEON optimised versions of some of those operations could help significantly. 

On x86/x64, we require SSE2 already anyway, so we should be able to use those unconditionally. On ARM, we can make this a compile time option with a C implementation as the fallback.

One problem is, that we can only get full benefit out of those if we can offer them inline. That would basically imply making our qsimd_p.h header public and including that one from qvectornd.h and qmatrixnxn.h (so that we can implement the operations using the SSE/NEON intrinsics). If we do that, we could e.g. implement QVector4D holding a __m128 value (and the neon equivalent on ARM).

I personally don’t think including qsimd.h (and implicitly immintrin.h) from our public headers would be a problem, but I’d be happy to hear arguments for/against it.

As a side note: SSE 4.1 offers some nice additional instructions that would simplify some of the operations. Should we keep the minimum requirement for SSE at version 2, or can we raise it to 4.1?

Cheers,
Lars