[Interest] Cross platform accelerated instructions framework

Thu May 14 17:04:33 CEST 2015

On Thursday 14 May 2015 14:52:45 Allan Sandfeld Jensen wrote:
> Alternatively use vector intrinsics, but the generic intrinsics are not
> that  powerful.
> 
> To write in a way that the compiler can auto-vectorize, write the CPU 
> intensive work in simple inner loops without function calls (or only
> inlined  ones), use no array access by anything other than the index
> counter, and also avoid branches as much as possible. If you do need
> branches, write them as using conditional assign with c ? a : b.
> 
> And no iOS and Android ARM are not identical. On iOS you can rely on NEON 
> iDiv, and on newer devices AArch64, on Android NEON is optional (but in all 
> high-end devices), and AArch64 CPUs not yet commonly available.

You could also write using the intrinsics, but be careful of not running into 
the problem that caused QTBUG-30440. 

Another issue will be detecting what capabilities the processors have. On x86, 
you can use the CPUID instruction (though you're also certain that all Atom 
processors have at least SSSE3). On Linux systems, you can get the hardware 
capabilities by reading /proc/self/auxv and looking for the AT_HWCAP field.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center