[Interest] SIMD accelerated wrappers for relevant Qt container classes (QVector, ...) or QByteArray?

Tue May 9 23:54:41 CEST 2017

On Tuesday 09 May 2017, René J.V. Bertin wrote:
> On Tuesday May 09 2017 22:49:53 Allan Sandfeld Jensen wrote:
> 
> Hi,
> 
> > Anything you can write with SIMD arrays is simple enough that the
> > compiler can also auto-vectorize it as well. Just pretend the arrays you
> > have are SIMD
> 
> I've been out of this for a few years but I'd be surprised that were really
> accurate today. If it were people wouldn't be developing and using things
> like Vc. I think. I wouldn't at least :)
> 
> I *am* willing to believe that auto-vectorised code can outperform
> hand-written SIMD code (and code using SIMD arrays) for certain classes of
> problems. I've seen at least one example of that (and that was about 7
> years ago already).
> 
It is nowhere near hand-written SIMD code, but then neither are generic 
libraries ;)
You can only get the best performance by using the right logic and right 
instructions hand-picked to work together, but you really need to do it at 
that level to get the full performance. Letting the compiler or a library do 
it for you makes it at least half as slow (but at 4x vectorization that is 
still twice as fast as not doing anything).

> > > Also, to get SIMD support and auto-vectorisation you need the correct
> > > -march CPU flag, no?
> > 
> > You would need that too with any SIMD-library.
> 
> Of course, but there you could probably also just use -D__SSE3__ -D__AVX__
> etc.
> 
I doubt that would work. At least for intrinsic it would produce very poor 
binary output since it would generate intermediate code the compiler then 
can't map to the optimal instructions they were meant for.

Maybe it works for inline assembler?

`Allan