[Interest] SIMD accelerated wrappers for relevant Qt container classes (QVector, ...) or QByteArray?

Wed May 10 10:44:32 CEST 2017

On Tuesday May 09 2017 23:54:41 Allan Sandfeld Jensen wrote:

>It is nowhere near hand-written SIMD code, but then neither are generic 
>libraries ;)

Not if you don't use them right, no. But I'd like to think there's still a gray area where humans can outsmart the compiler in this area, for instance because the auto-vectoriser doesn't use the full instruction set.
The kind of library I'm thinking of do suppose that their users know what they're doing. But even if not,

>(but at 4x vectorization that is still twice as fast as not doing anything).

if that's what you get when you can multiply two QVectors "as is" instead of writing out the loop in plain C there's still an advantage, no?

FWIW, this is my forked and updated copy of the MacSTL library I mentioned. WIP that I started neglecting after 2012 when I no longer had a need for it and my hardware started falling behind (I only have a 2011 i7 and a 2016 N3150 nowadays).
https://github.com/RJVB/MacSTL

>I doubt that would work. At least for intrinsic it would produce very poor 
>binary output since it would generate intermediate code the compiler then 
>can't map to the optimal instructions they were meant for.

That depends how the intrinsic functions are defined and if compiler switches like -mavx do anything other than defining the preprocessor token. I've never really looked at what gcc does precisely in this department.

>Maybe it works for inline assembler?

 If memory serves me well the last time I looked at the intrinsic headerfiles shipped with clang on Mac they just defined macros expanding to inline assembly. So yeah, that should work.

Cheers,
René