[Development] QMutex with pthread on Linux

Wed Aug 22 19:58:27 CEST 2012

On terça-feira, 21 de agosto de 2012 22.36.38, Thiago Macieira wrote:
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
>      4,087,893.432 CPU ticks per iteration
>       11037.507260 task-clock                #    2.699 CPUs
> utilized
> 33,483,481,790 cycles                    #    3.034 GHz
> 21,436,137,659 instructions              #    0.64  insns per cycle
> 12,012,804 raw_syscalls:sys_enter    #    1.088 M/sec 
> 4.088957193 seconds time elapsed
> 
> Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.

Here are the results after the rewrite, without adaptive locking (see below):
RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
     3,364,698.345 CPU ticks per iteration
       8775.205691 task-clock                #    2.924 CPUs utilized          
    26,978,578,571 cycles                    #    3.074 GHz                    
    18,091,438,451 instructions              #    0.67  insns per cycle        
        10,460,523 raw_syscalls:sys_enter    #    1.192 M/sec                  
       3.001549490 seconds time elapsed

The result for 4.04 seconds ran with 4.9 million ticks, but all the other 
numbers are the same. I can't explain why the tick counter is much higher for 
that one.

With adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
     1,919,764.064 CPU ticks per iteration
       5404.168638 task-clock                #    3.783 CPUs utilized          
    17,199,382,533 cycles                    #    3.183 GHz                    
    13,052,044,286 instructions              #    0.76  insns per cycle        
         8,071,929 raw_syscalls:sys_enter    #    1.494 M/sec                  
       1.428415478 seconds time elapsed

> RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
>      29,396,174.807 CPU ticks per iteration
>       48627.618792 task-clock                #    2.183 CPUs
> utilized
> 141,749,504,525 cycles                    #    2.915 GHz
> 78,008,558,700 instructions              #    0.55  insns per cycle
> 38,536,844 raw_syscalls:sys_enter    #    0.792 M/sec 
> 22.271697343 seconds time elapsed

Without adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
     28,641,366.537 CPU ticks per iteration
      47886.578653 task-clock                #    2.218 CPUs utilized          
   139,684,008,827 cycles                    #    2.917 GHz                    
    76,540,168,881 instructions              #    0.55  insns per cycle        
        38,837,066 raw_syscalls:sys_enter    #    0.811 M/sec                  
      21.586443075 seconds time elapsed

I.e., roughly the same.

With adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
     1,961,622.638 CPU ticks per iteration
       5561.854224 task-clock                #    3.781 CPUs utilized          
    17,706,600,180 cycles                    #    3.184 GHz                    
    13,209,273,979 instructions              #    0.75  insns per cycle        
         8,072,609 raw_syscalls:sys_enter    #    1.451 M/sec                  
       1.471046980 seconds time elapsed

Adaptive locking is a busy-wait spin ahead of the sleep, iterating 1000 times 
trying to acquire the mutex. The Qt 4 solution was time based, whereas the one 
I'm implementing is a fixed number of cycles. It's similar to Glibc's solution, 
which is also a number of cycles.

Note that the "without adaptive locking" solution still tries to acquire it 
once again. Without that, the results are much, much worse. I decided that 
trying once was an acceptable comparison because Olivier's original does try 
to lock once before trying to sleep.

In *this* particular case, it runs in less time and with less CPU time, but in 
other cases it's not the same. In the msleep(2) case, it runs in similar time 
as pthread, but it uses roughly 33% more CPU.

Conclusion: the biggest gain is the adaptive locking, even though it 
introduces a busy-wait. I'd recommend keeping it and making it smarter, really 
*adapting* to how often the mutex is contended.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120822/286d538d/attachment.sig>