[Development] Update: State of x86 SIMD in Qt

Mon Jan 2 14:10:57 CET 2012

Here's how my work currently stands:

On Thursday, 29 de December de 2011 17.00.09, Thiago Macieira wrote:
> 1) Drop the MMX code and the 3dNow! extensions now

Done. Also dropped the detection mechanism in qsimd.cpp, configure and 
configure.exe. I also updated the macros to be simpler to use. For example, for 
SSE2 support, we now should have:

QT_COMPILER_SUPPORTS_SSE2 - the compiler does support the feature, but it 
doesn't mean the feature is enabled

__SSE2__ - the feature is enabled by the compiler

Since MSVC doesn't set the latter flag, I made it do that. Testing pending.

I also made qsimd.cpp record which compiler settings were enabled at Qt build 
time and abort with qFatal if the CPU is missing them. This sounds bad, but it 
actually isn't: if Qt was compiled with SSE4.1 support enabled, then 
qstring.cpp would use it and your application would crash very, very early on 
anyway with SIGILL. This just makes the application quit with a nicer error 
message.

Configure prints on my machine now (gcc -march=core2 -mtune=corei7-avx):

SSE2/SSE3/SSSE3......... yes/yes/yes
SSE4.1/SSE4.2........... yes/yes
AVX/AVX2................ yes/no
Default CPU features.... cx16 mmx sse sse2 sse3 ssse3

ICC makes the AVX2 setting go to "yes", as will the update to GCC 4.7. The ARM 
build says:

SSE2/SSE3/SSSE3......... no/no/no
SSE4.1/SSE4.2........... no/no
AVX/AVX2................ no/no
iWMMXt support ......... no
NEON support ........... yes
Default CPU features.... neon

> 2) Compile qdrawhelper.cpp once, normally, no change to compiler flags

Done. If __SSE2__ is set, then qdrawhelper_plain.cpp will *not* compile the 
helpers. It will simply call qInitDrawhelperSse2().

> 3) If the compiler flags from the user do not already include -msse2,
> compile it *again* with -msse2; the same applies for -mfpu=neon on ARM.

Done. 

I also made it *not* add -msse2 if SSE2 was already enabled by the user. 
That's the case for everyone on x86-64, but it's more important for cases like 
enabling SSE3, SSSE3 (that was MeeGo) or more in the compiler flags. By doing 
nothing, we compile the helpers with those settings. If we passed -msse2, we 
would in fact *downgrade* the support.

> 4) If the compiler flags don't already include -mavx, do it *again* with
> -mavx.

Done.

> 5) Select a few operations that might benefit from SSE3 or SSSE3
> implementations on top of the SSE2 ones (my guess is it's only
> blend_argb32_on_argb32)

Done. If qdrawhelper_sse2.cpp is compiler with __SSSE3__, it unconditionally 
uses qdrawhelper_ssse3.cpp's functions and will not compile the SSE2 ones. 
That also helps in AVX mode.

Overall:

The code is compiling fine with GCC on x86, x86-64; ICC on x86-64 and I have a 
linker error on ARM with Neon.

Then I need to run the regression tests and investigate further improvements. 
I'm thinking of merging in qmemfunctions.cpp and qblendfunctions.cpp.

I haven't done any benchmarks yet. In terms of library size, I expect all 
libraries on x86 or x86-64 to increase in size, unless you disable some of the 
builds. For ARM, if your compiler flags *already* have -mfpu=neon, then there 
should be no change in size nor big performance gains, except those due to 
better compiler optimisation.

If your flags don't have -mfpu=neon, then the library size will increase unless 
you also specify -no-neon.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120102/450009e1/attachment.sig>