[Development] Using SSE/NEON in Qt 6

Thu Feb 6 19:29:01 CET 2020

On Donnerstag, 6. Februar 2020 12:45:51 CET Lars Knoll wrote:
> One problem is, that we can only get full benefit out of those if we can
> offer them inline. That would basically imply making our qsimd_p.h header
> public and including that one from qvectornd.h and qmatrixnxn.h (so that we
> can implement the operations using the SSE/NEON intrinsics). If we do that,
> we could e.g. implement QVector4D holding a __m128 value (and the neon
> equivalent on ARM).

One option is also to declare QVector4D as 16 byte aligned. Then it can still 
be read from and written to fast by SSE code, even if it isn't declared as 
holding a __m128 value. (unaligned load isn't much faster than aligned load on 
modern architectures, but aligned reads can also be arguments to other 
instructions saving many load instructions).

> I personally don’t think including qsimd.h (and implicitly immintrin.h) from
> our public headers would be a problem, but I’d be happy to hear arguments
> for/against it.
I don't think it is a problem either. I just don't want to be the one 
documenting it ;)

> As a side note: SSE 4.1 offers some nice additional instructions that would
> simplify some of the operations. Should we keep the minimum requirement for
> SSE at version 2, or can we raise it to 4.1?

That would be great. Especially for QtCore. Though we could start by just 
making the default SSE4.1 enabled but still offer users (linux distros 
really), the option to force it down to only SSE2. 

You could do the same with NEON, but I think we already use that 
unconditionally if detected at configure time.

Regards
'Allan