[Interest] building Qt 5.9 on Linux - clang or GCC?

Mon Dec 18 22:12:51 CET 2017

On Monday, 18 December 2017 11:55:42 PST René J. V. Bertin wrote:
> Thiago Macieira wrote:
> > It doesn't, because the debug information is not loaded in the first
> > place.
> > When using readelf, note how the "A" flag is missing for those sections.
> 
> So it has to skip certain, possibly considerable parts of the file while
> loading it, rather than simply doing some efficient operation to copy the
> whole file into memory. That should affect load times somewhat, no?

No, that's not how ELF works.

First of all, the dynamic linker doesn't actually read the section table. It 
reads the segment table, found in the ELF program headers (readelf -l):

$ readelf -l /lib/libm.so.6 

Elf file type is DYN (Shared object file)
Entry point 0x6200
There are 7 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0xf9264 0xf9264 R E 0x1000
  LOAD           0x0f9eb4 0x000faeb4 0x000faeb4 0x003cc 0x003d4 RW  0x1000
  DYNAMIC        0x0f9ebc 0x000faebc 0x000faebc 0x00118 0x00118 RW  0x4
  NOTE           0x000114 0x00000114 0x00000114 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x0dda54 0x000dda54 0x000dda54 0x016bc 0x016bc R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x0f9eb4 0x000faeb4 0x000faeb4 0x0014c 0x0014c R   0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr 
.gnu.version .gnu.version_d .gnu.version_r .rel.dyn .rel.plt .init .plt 
.plt.got .text .fini .rodata .eh_frame_hdr .eh_frame .hash 
   01     .init_array .fini_array .dynamic .got .got.plt .data .bss 
   02     .dynamic 
   03     .note.gnu.build-id .note.ABI-tag 
   04     .eh_frame_hdr 
   05     
   06     .init_array .fini_array .dynamic .got 

(I've pasted libm only for column width, try it on a Qt library with debugging 
list yourself)

Note the LOAD commands. That's what matters to the dynamic linker and what it 
will load. Note also how the debug sections are not in the first or second 
entries of the section-to-segment mapping list. That means the debugging 
sections are beyond the load regions and won't be present in memory.

Second, file binary is loaded via mmap(), which means the actual file contents 
aren't faulted into memory unless needed or unless there's an madvise() system 
call to tell the kernel to load. So even if the debug sections included in the 
LOAD regions, they wouldn't occupy core memory nor would affect the load time, 
unless something actually tried to access them.

> > One more reason to use GCC. It only builds once, even under LTO, unless
> > you
> > specifically ask for the fat LTO objects.
> 
> Yet even with GCC the build times and memory requirements are larger with
> LTO than without. How can it not do certain things twice?

The build time has nothing to do with doing things twice. It has to do with 
the amount of work.

Even with LTO, the compiler must start and process each translation unit. The 
difference between LTO and a normal build is that in the former, it needs to 
do less work since it doesn't actually run the optimiser. It just needs to 
dump some intermediary information.

The difference is with the linker. In a regular build, even with -Wl,-O1, the 
linker does very little and its job is to basically concatenate sections of 
each input file. In an LTO build, the linker calls the compiler again and that 
will need to reload all the intermediary information and perform the 
optimisation, now with a much larger dataset.

In my experience, a thin LTO build is actually faster (and produces better 
code) than an equivalent non-LTO build, but that doesn't apply to all cases.

Regular, optimised (-O3 -g1) build of qmake:
	Time to build: 268,00s user 11,28s system 368% cpu 1:15,87 total
	Total object sizes (kB): 69596
	Binary size (after stripping):
   text    data     bss     dec     hex filename
3008485    2080    6361 3016926  2e08de ../bin/qmake

Simple LTO build (-O3 -g1 -flto -fno-fat-lto-objects, linking* -flto=4):
	Time: 208,01s user 10,36s system 365% cpu 59,731 total
	Total object sizes: 32476
	Binary:
   text    data     bss     dec     hex filename
2427597    1972    6217 2435786  252aca ../bin/qmake

Fat LTO build (-O3 -g1 -flto -ffat-lto-objects, linking* -flto=4):
	Time: 371,19s user 13,49s system 369% cpu 1:44,11 total
	Sizes: 101928
	Binary:
   text    data     bss     dec     hex filename
2427597    1972    6217 2435786  252aca ../bin/qmake

*: Don't forget to pass -O3 -g1 to the linker too, otherwise the LTO step 
won't optimise!

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center