[Development] QMutex with pthread on Linux
Thiago Macieira
thiago.macieira at intel.com
Wed Aug 22 19:58:27 CEST 2012
On terça-feira, 21 de agosto de 2012 22.36.38, Thiago Macieira wrote:
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
> 4,087,893.432 CPU ticks per iteration
> 11037.507260 task-clock # 2.699 CPUs
> utilized
> 33,483,481,790 cycles # 3.034 GHz
> 21,436,137,659 instructions # 0.64 insns per cycle
> 12,012,804 raw_syscalls:sys_enter # 1.088 M/sec
> 4.088957193 seconds time elapsed
>
> Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.
Here are the results after the rewrite, without adaptive locking (see below):
RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
3,364,698.345 CPU ticks per iteration
8775.205691 task-clock # 2.924 CPUs utilized
26,978,578,571 cycles # 3.074 GHz
18,091,438,451 instructions # 0.67 insns per cycle
10,460,523 raw_syscalls:sys_enter # 1.192 M/sec
3.001549490 seconds time elapsed
The result for 4.04 seconds ran with 4.9 million ticks, but all the other
numbers are the same. I can't explain why the tick counter is much higher for
that one.
With adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
1,919,764.064 CPU ticks per iteration
5404.168638 task-clock # 3.783 CPUs utilized
17,199,382,533 cycles # 3.183 GHz
13,052,044,286 instructions # 0.76 insns per cycle
8,071,929 raw_syscalls:sys_enter # 1.494 M/sec
1.428415478 seconds time elapsed
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
> 29,396,174.807 CPU ticks per iteration
> 48627.618792 task-clock # 2.183 CPUs
> utilized
> 141,749,504,525 cycles # 2.915 GHz
> 78,008,558,700 instructions # 0.55 insns per cycle
> 38,536,844 raw_syscalls:sys_enter # 0.792 M/sec
> 22.271697343 seconds time elapsed
Without adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
28,641,366.537 CPU ticks per iteration
47886.578653 task-clock # 2.218 CPUs utilized
139,684,008,827 cycles # 2.917 GHz
76,540,168,881 instructions # 0.55 insns per cycle
38,837,066 raw_syscalls:sys_enter # 0.811 M/sec
21.586443075 seconds time elapsed
I.e., roughly the same.
With adaptive locking:
RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
1,961,622.638 CPU ticks per iteration
5561.854224 task-clock # 3.781 CPUs utilized
17,706,600,180 cycles # 3.184 GHz
13,209,273,979 instructions # 0.75 insns per cycle
8,072,609 raw_syscalls:sys_enter # 1.451 M/sec
1.471046980 seconds time elapsed
Adaptive locking is a busy-wait spin ahead of the sleep, iterating 1000 times
trying to acquire the mutex. The Qt 4 solution was time based, whereas the one
I'm implementing is a fixed number of cycles. It's similar to Glibc's solution,
which is also a number of cycles.
Note that the "without adaptive locking" solution still tries to acquire it
once again. Without that, the results are much, much worse. I decided that
trying once was an acceptable comparison because Olivier's original does try
to lock once before trying to sleep.
In *this* particular case, it runs in less time and with less CPU time, but in
other cases it's not the same. In the msleep(2) case, it runs in similar time
as pthread, but it uses roughly 33% more CPU.
Conclusion: the biggest gain is the adaptive locking, even though it
introduces a busy-wait. I'd recommend keeping it and making it smarter, really
*adapting* to how often the mutex is contended.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120822/286d538d/attachment.sig>
More information about the Development
mailing list