[Development] State of x86 SIMD in Qt

Thiago Macieira thiago.macieira at intel.com
Thu Dec 29 15:12:53 CET 2011


Hello

[Glossary at the bottom]

I began adding an AVX-mode build for certain files in Qt. While I don't plan on 
writing new routines using AVX instructions, the presence of the VEX prefix 
should already improve performance. It was quite easy to do so for qimage_sse2 
and qimage_ssse3, which are newer code. That contribution is ready.

When I turned to the draw helpers, however, it was almost despairing. The 
initialisation code is a mess to disentangle. I managed to add VEX-encoded 
versions of qdrawhelper_sse2 and qdrawhelper_ssse3 easily, like for qimage, 
and then I began looking at the older code.

We have a problem.

The files qdrawhelper_mmx, qdrawhelper_mmx3dnow, qdrawhelper_sse and 
qdrawhelper_sse3dnow need to go away. The MMX technology registers are old and 
few -- and the two "sse" files don't use SSE, they simply use MMX instructions 
added alongside the SSE ones. Not only that, we're using the MMX registers for 
single-data floating point, instead of using it for single-instruction 
multiple-data (SIMD).

To makes matters worse, doing so is a *pessimisation* on 64-bit mode. Since 
all 64-bit capable CPUs have SSE2, the 64-bit compilation uses SSE 
instructions for floating point *by* *default*. By doing a runtime detection of 
MMX, we use the older instructions, with fewer registers (more register 
pressure). I'm almost certain the helpers I listed above run worse than the 
code generated by the plain C++ code, in some cases.

Proposed solutions:

1) immediately disable the use of MMX technology registers in 64-bit mode. 
Turns out that this is already supported in the code because MSVC in 64-bit 
mode does not offer MMX support. We only need to apply this to GCC and ICC.

2) apply the same for 32-bit builds targetting recent CPUs (e.g., when the 
user passes -march=). We can add #pragma GCC target("fpmath=sse")

3) compile the plain C++ code in SSE mode with -mfpmath=sse so that, at 
runtime, we can choose *that* instead of MMX.

4) rewrite all the past code, where applicable, to use SSE and SSE2 (or 
better), but in SIMD mode. This code can also be built with VEX prefixes.

5) drop the MMX code.

We can do 1-5, 1+4+5 or straight on 4 and 5 only.

I can get started, but I need to know: how is this code tested? How can I 
verify that things are working?



Glossary:
MMX - MultiMedia eXtensions, introduced with the Pentium MMX. There are 8 MMX 
registers, 64-bit in width, aliased to the x87 floating point registers. To use 
the MMX registers, one must use the "femms" instruction at the beginning and 
"emms" at the end.

3dNow! - AMD extensions to MMX

SIMD - Single Instruction Multiple Data, a concept when you operate on 
multiple data at the same time.

SSE - Streaming SIMD Extensions, introduced with the Pentium III (1999). There 
are 8 MMX registers in 32-bit mode and 16 of them in 64-bit mode, all 128-bit 
(or more) in width.

SSE2 - introduced in 2001 on Intel, 2003 on AMD and *all* 64-bit CPUs have 
them. SSE2 is used for floating point operations on 64-bit mode.

SSE3 - Pentium 4 (2004) and Athlon 64 (2005).

SSSE3 - Intel Core (2006) and Atom, AMD Bulldozer (2011)

SSE4.1, SSE4.2, SSE4a - a mess

VEX prefix - a new way of encoding SSE instructions on x86, with one more 
register, so operations take one non-destructive source. All SSE instructions 
can use this, but not MMX instructions

AVX - Advanced Vector Extensions, introduced with the SandyBridge (second 
generation Core-iX), doubles the width of the SSE registers to 256-bit but 
doesn't add many instructions to use those extra bits.

AVX2 - to be introduced with the Haswell, extends all past SSE instructions to 
256-bit

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20111229/93a78c82/attachment.sig>


More information about the Development mailing list