[Development] State of x86 SIMD in Qt
thiago.macieira at intel.com
Thu Dec 29 15:12:53 CET 2011
[Glossary at the bottom]
I began adding an AVX-mode build for certain files in Qt. While I don't plan on
writing new routines using AVX instructions, the presence of the VEX prefix
should already improve performance. It was quite easy to do so for qimage_sse2
and qimage_ssse3, which are newer code. That contribution is ready.
When I turned to the draw helpers, however, it was almost despairing. The
initialisation code is a mess to disentangle. I managed to add VEX-encoded
versions of qdrawhelper_sse2 and qdrawhelper_ssse3 easily, like for qimage,
and then I began looking at the older code.
We have a problem.
The files qdrawhelper_mmx, qdrawhelper_mmx3dnow, qdrawhelper_sse and
qdrawhelper_sse3dnow need to go away. The MMX technology registers are old and
few -- and the two "sse" files don't use SSE, they simply use MMX instructions
added alongside the SSE ones. Not only that, we're using the MMX registers for
single-data floating point, instead of using it for single-instruction
To makes matters worse, doing so is a *pessimisation* on 64-bit mode. Since
all 64-bit capable CPUs have SSE2, the 64-bit compilation uses SSE
instructions for floating point *by* *default*. By doing a runtime detection of
MMX, we use the older instructions, with fewer registers (more register
pressure). I'm almost certain the helpers I listed above run worse than the
code generated by the plain C++ code, in some cases.
1) immediately disable the use of MMX technology registers in 64-bit mode.
Turns out that this is already supported in the code because MSVC in 64-bit
mode does not offer MMX support. We only need to apply this to GCC and ICC.
2) apply the same for 32-bit builds targetting recent CPUs (e.g., when the
user passes -march=). We can add #pragma GCC target("fpmath=sse")
3) compile the plain C++ code in SSE mode with -mfpmath=sse so that, at
runtime, we can choose *that* instead of MMX.
4) rewrite all the past code, where applicable, to use SSE and SSE2 (or
better), but in SIMD mode. This code can also be built with VEX prefixes.
5) drop the MMX code.
We can do 1-5, 1+4+5 or straight on 4 and 5 only.
I can get started, but I need to know: how is this code tested? How can I
verify that things are working?
MMX - MultiMedia eXtensions, introduced with the Pentium MMX. There are 8 MMX
registers, 64-bit in width, aliased to the x87 floating point registers. To use
the MMX registers, one must use the "femms" instruction at the beginning and
"emms" at the end.
3dNow! - AMD extensions to MMX
SIMD - Single Instruction Multiple Data, a concept when you operate on
multiple data at the same time.
SSE - Streaming SIMD Extensions, introduced with the Pentium III (1999). There
are 8 MMX registers in 32-bit mode and 16 of them in 64-bit mode, all 128-bit
(or more) in width.
SSE2 - introduced in 2001 on Intel, 2003 on AMD and *all* 64-bit CPUs have
them. SSE2 is used for floating point operations on 64-bit mode.
SSE3 - Pentium 4 (2004) and Athlon 64 (2005).
SSSE3 - Intel Core (2006) and Atom, AMD Bulldozer (2011)
SSE4.1, SSE4.2, SSE4a - a mess
VEX prefix - a new way of encoding SSE instructions on x86, with one more
register, so operations take one non-destructive source. All SSE instructions
can use this, but not MMX instructions
AVX - Advanced Vector Extensions, introduced with the SandyBridge (second
generation Core-iX), doubles the width of the SSE registers to 256-bit but
doesn't add many instructions to use those extra bits.
AVX2 - to be introduced with the Haswell, extends all past SSE instructions to
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 190 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/development/attachments/20111229/93a78c82/attachment.bin
More information about the Development