[Interest] building Qt 4.8.7 with gcc 5 and link-time optimisation on Linux

Sat Jul 25 09:50:06 CEST 2015

Thiago Macieira wrote:

>> > For Clang, QT_COMPILER_SUPPORTS_HERE(x) expands to a check defined(__x__)
>> > [in this case, if __SSE4_2__ is defined].
...
> 
> __SSE4_2__ isn't defined anywhere you'll see. It's pre-defined by the
> compiler.

Well yes of course, I know that. I meant the declaration that makes 
QT_COMPILER_SUPPORTS_HERE(x) expand to a direct check of __SSE4_2__ .

>> I'm actually a bit surprised that either the compiler finds nothing in the
>> Qt code to auto-vectorise with SSE4 instructions, or that that doesn't lead
>> to issues in the linker with LTO.
> 
> The latter. As I said, my guess is this is a compiler bug because it obviously
> has SSE4.2 enabled.

Well, have you checked that auto-vectorisation indeed doesn't use SSE4 
instructions? In my experience it is in fact not very common; I haven't run into 
related issues frequently with code built with -march=native on one of those VMs 
I referred to that don't support SSE4 despite virtualising a capable CPU.

> Note that GCC only auto-vectorises on -O3. I don't know about Clang.

I'm quite sure it's the same. Otherwise I'd have continued my habit of using the 
equivalent of -ftree-vectorize :)

> Because it can't be disabled in the compiler. It *always* generates those
> instructions.

...
> That's incorrect. SSE4.2 is enabled in your compiler because you're using
> Apple's build of Clang.

I don't think it's as black-and-white as that ...

>> Also, note that code that has to run on VMs may need to deactivate SSE4
>> support. There is at least 1 virtualisation solution that does not expose
>> the instruction set.
> 
> No VM will ever do that and run OS X code.

Try VirtualBox. I've still run into issues not long ago that forced me to build 
with -march=core2 instead of -march=native, on a host with a recent i5 CPU (and 
come to think of it, with Qt 5.4). I only *had* access to the VM so I don't know 
exactly what instruction sets the host supported, but I don't think this had 
anything to do with more recent instruction sets.

BTW, trying your expression on OS X 10.9 :

%> clang -dM -E -xc /dev/null | fgrep -i SSE
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1
%> clang -march=native -v -dM -E -xc /dev/null | egrep -i 'SSE|AVX|MMX|MUL'
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" 
-cc1 -triple x86_64-apple-macosx10.9.0 -E -disable-free -disable-llvm-verifier -
main-file-name null -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -masm-
verbose -munwind-tables -target-cpu corei7-avx -target-linker-version 241.9 -v -
resource-dir 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0 
-fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 117 -stack-
protector 1 -mstackrealign -fblocks -fobjc-runtime=macosx-10.9.0 -fencode-
extended-block-signature -fdiagnostics-show-option -fcolor-diagnostics -
vectorize-slp -dM -o - -x c /dev/null
clang -cc1 version 6.0 based upon LLVM 3.5svn default target x86_64-apple-
darwin13.4.0
[...]
#define __AVX__ 1
#define __MMX__ 1
#define __PCLMUL__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1
%> clang -march=native -mno-sse4.1 -dM -E -xc /dev/null | fgrep -i SSE
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1

R.