[Development] QMutex with pthread on Linux

Thiago Macieira thiago.macieira at intel.com
Tue Aug 21 22:36:38 CEST 2012


Hello

I've just done some benchmarking of QMutex on Linux, using the pthread 
implementation instead of the futex one.

Conclusions first:

QMutex is optimised for uncontended case. It does that by keeping the d 
pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. Changing 
from one value to another is extremely quick, requiring a simple atomic 
operation. QMutex when uncontended proves to be roughly 16% faster than 
pthread. This also shows in the benchmarks that use non-zero msleep: the mutex 
is mostly uncontended.

That comes at a price, though: the performance drops considerably when 
contention happens.

When contention happens at a low rate (the "msleep(0)" case), QMutex 
performance is similar to that of pthread, though slightly worse (up to 5%).

When contention happens a lot, the performance is awful. I've measured 
anything from 100% slower to over 1000%.

Extrapolating these results to Mac and Windows, I expect QMutex performance in 
uncontended to be *much* better, but still lose horribly in the contended 
case.

Conclusion: I'm glad I use Linux and that we have futex.

DATA:

Reference:
Intel i7-2620M (SandyBridge)
	2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz
	CPU in "performance" governor
Linux 3.5.2
glibc 2.15
Fedora 17
GCC 4.7.1, 64-bit mode
QtCore linked with LTO

All results are the best out of 6 runs, under realtime FIFO scheduling.

Uncontended Mutex results (100 million iterations):

RESULT : tst_QMutex::uncontendedNative():
     60.5891925 CPU ticks per iteration
        450.189192 task-clock                #    0.999 CPUs utilized          
     1,511,489,291 cycles                    #    3.357 GHz                    
     1,306,287,711 instructions              #    0.86  insns per cycle        
               197 raw_syscalls:sys_enter    #    0.438 K/sec                  
       0.450477229 seconds time elapsed

RESULT : tst_QMutex::uncontendedQMutex():
     50.7105596 CPU ticks per iteration
        379.784144 task-clock                #    0.999 CPUs utilized          
     1,268,507,621 cycles                    #    3.340 GHz                    
       745,975,928 instructions              #    0.59  insns per cycle        
               194 raw_syscalls:sys_enter    #    0.511 K/sec                  
       0.380036271 seconds time elapsed

Contended Mutex results (1000 iterations):

RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex":
     2,052,212.507 CPU ticks per iteration
       5814.825257 task-clock                #    3.797 CPUs utilized          
    18,513,286,444 cycles                    #    3.184 GHz                    
    13,801,932,519 instructions              #    0.75  insns per cycle        
         8,609,051 raw_syscalls:sys_enter    #    1.481 M/sec                  
       1.531495948 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
     4,087,893.432 CPU ticks per iteration
      11037.507260 task-clock                #    2.699 CPUs utilized          
    33,483,481,790 cycles                    #    3.034 GHz                    
    21,436,137,659 instructions              #    0.64  insns per cycle        
        12,012,804 raw_syscalls:sys_enter    #    1.088 M/sec                  
       4.088957193 seconds time elapsed

Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.

RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes":
     2,550,929.603 CPU ticks per iteration
       7155.513345 task-clock                #    3.763 CPUs utilized          
    22,760,839,897 cycles                    #    3.181 GHz                    
    16,370,712,299 instructions              #    0.72  insns per cycle        
        10,457,934 raw_syscalls:sys_enter    #    1.462 M/sec                  
       1.901400808 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
     29,396,174.807 CPU ticks per iteration
      48627.618792 task-clock                #    2.183 CPUs utilized          
   141,749,504,525 cycles                    #    2.915 GHz                    
    78,008,558,700 instructions              #    0.55  insns per cycle        
        38,536,844 raw_syscalls:sys_enter    #    0.792 M/sec                  
      22.271697343 seconds time elapsed

100 iterations:
RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex":
     67,621,168.46 CPU ticks per iteration
       4326.998212 task-clock                #    0.859 CPUs utilized          
    11,239,050,634 cycles                    #    2.597 GHz                    
     8,415,799,134 instructions              #    0.75  insns per cycle        
         2,965,384 raw_syscalls:sys_enter    #    0.685 M/sec                  
       5.036652093 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex":
     70,621,368.59 CPU ticks per iteration
       4909.514006 task-clock                #    0.934 CPUs utilized          
    13,123,468,429 cycles                    #    2.673 GHz                    
     9,532,793,349 instructions              #    0.73  insns per cycle        
         3,619,607 raw_syscalls:sys_enter    #    0.737 M/sec                  
       5.253921952 seconds time elapsed

RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes":
     67,478,669.37 CPU ticks per iteration
       4314.232114 task-clock                #    0.857 CPUs utilized          
    11,244,572,017 cycles                    #    2.606 GHz                    
     8,382,057,867 instructions              #    0.75  insns per cycle        
         2,939,351 raw_syscalls:sys_enter    #    0.681 M/sec                  
       5.035212837 seconds time elapsed

RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes":
     70,837,078.76 CPU ticks per iteration
       4933.702732 task-clock                #    0.929 CPUs utilized          
    13,192,133,179 cycles                    #    2.674 GHz                    
     9,554,807,698 instructions              #    0.72  insns per cycle        
         3,622,623 raw_syscalls:sys_enter    #    0.734 M/sec                  
       5.309986829 seconds time elapsed

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120821/1da97fe8/attachment.sig>


More information about the Development mailing list