[Development] QMutex with pthread on Linux

Wed Aug 22 09:00:05 CEST 2012

On Aug 21, 2012, at 10:36 PM, ext Thiago Macieira <thiago.macieira at intel.com> wrote:

> Hello
> 
> I've just done some benchmarking of QMutex on Linux, using the pthread 
> implementation instead of the futex one.
> 
> Conclusions first:
> 
> QMutex is optimised for uncontended case. It does that by keeping the d 
> pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. Changing 
> from one value to another is extremely quick, requiring a simple atomic 
> operation. QMutex when uncontended proves to be roughly 16% faster than 
> pthread. This also shows in the benchmarks that use non-zero msleep: the mutex 
> is mostly uncontended.
> 
> That comes at a price, though: the performance drops considerably when 
> contention happens.
> 
> When contention happens at a low rate (the "msleep(0)" case), QMutex 
> performance is similar to that of pthread, though slightly worse (up to 5%).
> 
> When contention happens a lot, the performance is awful. I've measured 
> anything from 100% slower to over 1000%.

Uhhh… is there a reason why it's that much slower than the pthread mutex? A factor of 2 to 10 slower is not really what we should have.

Cheers,
Lars

> 
> Extrapolating these results to Mac and Windows, I expect QMutex performance in 
> uncontended to be *much* better, but still lose horribly in the contended 
> case.
> 
> Conclusion: I'm glad I use Linux and that we have futex.
> 
> DATA:
> 
> Reference:
> Intel i7-2620M (SandyBridge)
> 	2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz
> 	CPU in "performance" governor
> Linux 3.5.2
> glibc 2.15
> Fedora 17
> GCC 4.7.1, 64-bit mode
> QtCore linked with LTO
> 
> All results are the best out of 6 runs, under realtime FIFO scheduling.
> 
> Uncontended Mutex results (100 million iterations):
> 
> RESULT : tst_QMutex::uncontendedNative():
>     60.5891925 CPU ticks per iteration
>        450.189192 task-clock                #    0.999 CPUs utilized          
>     1,511,489,291 cycles                    #    3.357 GHz                    
>     1,306,287,711 instructions              #    0.86  insns per cycle        
>               197 raw_syscalls:sys_enter    #    0.438 K/sec                  
>       0.450477229 seconds time elapsed
> 
> RESULT : tst_QMutex::uncontendedQMutex():
>     50.7105596 CPU ticks per iteration
>        379.784144 task-clock                #    0.999 CPUs utilized          
>     1,268,507,621 cycles                    #    3.340 GHz                    
>       745,975,928 instructions              #    0.59  insns per cycle        
>               194 raw_syscalls:sys_enter    #    0.511 K/sec                  
>       0.380036271 seconds time elapsed
> 
> Contended Mutex results (1000 iterations):
> 
> RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex":
>     2,052,212.507 CPU ticks per iteration
>       5814.825257 task-clock                #    3.797 CPUs utilized          
>    18,513,286,444 cycles                    #    3.184 GHz                    
>    13,801,932,519 instructions              #    0.75  insns per cycle        
>         8,609,051 raw_syscalls:sys_enter    #    1.481 M/sec                  
>       1.531495948 seconds time elapsed
> 
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
>     4,087,893.432 CPU ticks per iteration
>      11037.507260 task-clock                #    2.699 CPUs utilized          
>    33,483,481,790 cycles                    #    3.034 GHz                    
>    21,436,137,659 instructions              #    0.64  insns per cycle        
>        12,012,804 raw_syscalls:sys_enter    #    1.088 M/sec                  
>       4.088957193 seconds time elapsed
> 
> Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.
> 
> RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes":
>     2,550,929.603 CPU ticks per iteration
>       7155.513345 task-clock                #    3.763 CPUs utilized          
>    22,760,839,897 cycles                    #    3.181 GHz                    
>    16,370,712,299 instructions              #    0.72  insns per cycle        
>        10,457,934 raw_syscalls:sys_enter    #    1.462 M/sec                  
>       1.901400808 seconds time elapsed
> 
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
>     29,396,174.807 CPU ticks per iteration
>      48627.618792 task-clock                #    2.183 CPUs utilized          
>   141,749,504,525 cycles                    #    2.915 GHz                    
>    78,008,558,700 instructions              #    0.55  insns per cycle        
>        38,536,844 raw_syscalls:sys_enter    #    0.792 M/sec                  
>      22.271697343 seconds time elapsed
> 
> 100 iterations:
> RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex":
>     67,621,168.46 CPU ticks per iteration
>       4326.998212 task-clock                #    0.859 CPUs utilized          
>    11,239,050,634 cycles                    #    2.597 GHz                    
>     8,415,799,134 instructions              #    0.75  insns per cycle        
>         2,965,384 raw_syscalls:sys_enter    #    0.685 M/sec                  
>       5.036652093 seconds time elapsed
> 
> RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex":
>     70,621,368.59 CPU ticks per iteration
>       4909.514006 task-clock                #    0.934 CPUs utilized          
>    13,123,468,429 cycles                    #    2.673 GHz                    
>     9,532,793,349 instructions              #    0.73  insns per cycle        
>         3,619,607 raw_syscalls:sys_enter    #    0.737 M/sec                  
>       5.253921952 seconds time elapsed
> 
> RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes":
>     67,478,669.37 CPU ticks per iteration
>       4314.232114 task-clock                #    0.857 CPUs utilized          
>    11,244,572,017 cycles                    #    2.606 GHz                    
>     8,382,057,867 instructions              #    0.75  insns per cycle        
>         2,939,351 raw_syscalls:sys_enter    #    0.681 M/sec                  
>       5.035212837 seconds time elapsed
> 
> RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes":
>     70,837,078.76 CPU ticks per iteration
>       4933.702732 task-clock                #    0.929 CPUs utilized          
>    13,192,133,179 cycles                    #    2.674 GHz                    
>     9,554,807,698 instructions              #    0.72  insns per cycle        
>         3,622,623 raw_syscalls:sys_enter    #    0.734 M/sec                  
>       5.309986829 seconds time elapsed
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel Open Source Technology Center
>     Intel Sweden AB - Registration Number: 556189-6027
>     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development