[Development] Serious ABI issue discovered in -reduce-relocations

Fri Jan 13 15:33:46 CET 2012

Hello

We've got a problem with -reduce-relocations. tl;dr: it's a broken concept and 
we either add a permanent workaround or we stop using it. The permanent 
workaround is to compile all executables in PIC/PIE mode.

Long story:
The -reduce-relocations option in configure checks that the compiler supports 
the linker flag -Bsymbolic-functions. That function was added to binutils in 
2006 from our urging, to make it possible for us to use it when the -Bsymbolic 
option presented problems. Turns out that -Bsymbolic-functions has the same 
problems that -Bsymbolic had and is no fix.

Those two options cause the linker to "symbolic link" some symbols into the 
binary it's producing. That is, if a symbol X is used and is also defined 
inside this ELF module, then this option tells the linker that it may rightly 
assume that the symbol will always be inside this module. The linker will then 
use cheaper types of relocation, or none at all. This is a huge performance 
improvement both at load- and at run-time.

-Bsymbolic does it for everything, whereas -Bsymbolic-functions does it for 
functions only.

The reason why we needed -Bsymbolic-functions in the first place is that ELF 
has a weird feature that causes data variables to move between modules. 
Functions weren't affected because they aren't moved.

Turns out that there is one situation in which a function is treated as data: 
when you take its address. In order to compare equally, the dynamic linker 
must resolve the function address to only one place, and unfortunately for us, 
the choice isn't to our liking. The "canonical" address may be moved from the 
library.

We haven't hit this problem before because we hadn't been doing function 
pointer comparisons. Now, with Olivier's "new connection syntax" patch, we 
are.

The workaround possible is to tell the compiler and linker that even 
executables are position-independent. This causes the linker to stop using 
copy/move relocations because it doesn't need them. However, there use of PIC 
may have a non-trivial performance impact on applications, due to indirect 
variable accesses and loss of one register.

Regardless of whether I manage to convince the linker people to improve the 
situation, we need to figure out a solution for existing systems. What shall we 
do?

Even longer story (background):

In code that isn't position-independent (i.e., the executable), a data access 
is done as:
        movl    variable, %eax

And a function call as:
        call    function

And the loading of a function address as:
        movl    $function, %edi

When linking this program, the linker needs to write the address of the 
variable "variable" and of the function "function" into the instructions (one 
is absolute and the other relative, but that's irrelevant). If both symbols 
are found in a shared library, then the linker will "patch up" differently. 

For the function, it will make the "call" instruction call to a stub called 
the Procedure Linkage Table (PLT), which then loads the proper address from 
somewhere and then jumps to the proper address. That somewhere is another 
structure called the Global Offset Table, which the dynamic linker will fill 
with the actual function address once the library has been loaded.

For the variable, things get complicated. There's no way to do the PLT trick. 
So what the linker does instead is add a "copy relocation". It writes the name 
of the variable and its expected size and reserves that much in the 
executable. The dynamic linker will then, at load time, find the variable in 
the shared library, copy the contents and then tell the library it should 
instead find the variable in the executable's memory.

When using position-independent code options (-fPIC and -fPIE), things change. 
The compiler will write for the function call:
        call    function at PLT

The loading of a function address is:
        movq    function at GOTPCREL(%rip), %rdi

As for the variable, it produces:
        movq    variable at GOTPCREL(%rip), %rax
        movl    (%rax), %eax

All accesses are position-independent and indirect. The call is placed via the 
PLT, addresses are loaded from the GOT and the loading of values is done after 
the actual address is loaded from the GOT.

This is suitable for accessing symbols defined in other ELF modules. It's also 
necessary for library code.

Unfortunately, the side-effect is that access to symbols defined in the current 
ELF module is also done indirectly. Two options help change this: -
fvisibility=hidden and the symbolics.

The -fvisibility=hidden option is enabled by default in Qt since 4.0 and 
corresponds to the configure option -reduce-exports. It does not change the 
code above, so it means that all variable accesses to variables not defined in 
the same compilation unit are indirect. Fortunately for the function call, the 
linker realises that target is inside the library and cannot be anywhere else, 
so the call is now direct to function. The loading of the address is via the 
GOT, which means a run-time relocation is still necessary, when the most 
efficient solution would be to use the "load effective address" instruction with 
no relocation.

The -Bsymbolic and -Bsymbolic-functions produce the same effect, with the 
difference that the symbol is left the ELF export table (i.e., "default" 
visibility).

The consequences of all of this are:
 1) there's absolutely no way to get the most efficient code in libraries, 
period. ELF is optimised for executable code, not library.
 2) -Bsymbolic is a broken concept so long as copy relocations remain in use
 3) -Bsymbolic-functions is either the same broken concept or a broken 
implementation. It might be possible to salvage the option by making the 
linker optimise the PLT calls like it does today, but keep the GOT references 
as public.
 4) calling a function via a function pointer is inefficient because of an 
indirect jump. If that function's address was taken in the executable, it's 
doubly inefficient: the indirect jump you make resolves to another indirect 
jump.

The only architecture not affected by this is IA-64. One reason is that IA-64 
ABI mandates that executables also be PIC, so the original problem is gone: 
there are no copy relocations. What's more, Intel engineers realised the 
problem of the indirect loading of data and invented a special relocation that 
the linker is allowed to relax into simpler code. If the symbol is found, at 
link-time, to be on the same ELF module, the linker relaxes the "load" 
generated by the compiler into a "move" between registers.

It's possible to apply the same lessons learned to other platforms, but it 
hasn't been done.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120113/f7c9be7b/attachment.sig>