[Development] QMutex with pthread on Linux
Thiago Macieira
thiago.macieira at intel.com
Tue Aug 21 22:36:38 CEST 2012
Hello
I've just done some benchmarking of QMutex on Linux, using the pthread
implementation instead of the futex one.
Conclusions first:
QMutex is optimised for uncontended case. It does that by keeping the d
pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. Changing
from one value to another is extremely quick, requiring a simple atomic
operation. QMutex when uncontended proves to be roughly 16% faster than
pthread. This also shows in the benchmarks that use non-zero msleep: the mutex
is mostly uncontended.
That comes at a price, though: the performance drops considerably when
contention happens.
When contention happens at a low rate (the "msleep(0)" case), QMutex
performance is similar to that of pthread, though slightly worse (up to 5%).
When contention happens a lot, the performance is awful. I've measured
anything from 100% slower to over 1000%.
Extrapolating these results to Mac and Windows, I expect QMutex performance in
uncontended to be *much* better, but still lose horribly in the contended
case.
Conclusion: I'm glad I use Linux and that we have futex.
DATA:
Reference:
Intel i7-2620M (SandyBridge)
2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz
CPU in "performance" governor
Linux 3.5.2
glibc 2.15
Fedora 17
GCC 4.7.1, 64-bit mode
QtCore linked with LTO
All results are the best out of 6 runs, under realtime FIFO scheduling.
Uncontended Mutex results (100 million iterations):
RESULT : tst_QMutex::uncontendedNative():
60.5891925 CPU ticks per iteration
450.189192 task-clock # 0.999 CPUs utilized
1,511,489,291 cycles # 3.357 GHz
1,306,287,711 instructions # 0.86 insns per cycle
197 raw_syscalls:sys_enter # 0.438 K/sec
0.450477229 seconds time elapsed
RESULT : tst_QMutex::uncontendedQMutex():
50.7105596 CPU ticks per iteration
379.784144 task-clock # 0.999 CPUs utilized
1,268,507,621 cycles # 3.340 GHz
745,975,928 instructions # 0.59 insns per cycle
194 raw_syscalls:sys_enter # 0.511 K/sec
0.380036271 seconds time elapsed
Contended Mutex results (1000 iterations):
RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex":
2,052,212.507 CPU ticks per iteration
5814.825257 task-clock # 3.797 CPUs utilized
18,513,286,444 cycles # 3.184 GHz
13,801,932,519 instructions # 0.75 insns per cycle
8,609,051 raw_syscalls:sys_enter # 1.481 M/sec
1.531495948 seconds time elapsed
RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
4,087,893.432 CPU ticks per iteration
11037.507260 task-clock # 2.699 CPUs utilized
33,483,481,790 cycles # 3.034 GHz
21,436,137,659 instructions # 0.64 insns per cycle
12,012,804 raw_syscalls:sys_enter # 1.088 M/sec
4.088957193 seconds time elapsed
Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.
RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes":
2,550,929.603 CPU ticks per iteration
7155.513345 task-clock # 3.763 CPUs utilized
22,760,839,897 cycles # 3.181 GHz
16,370,712,299 instructions # 0.72 insns per cycle
10,457,934 raw_syscalls:sys_enter # 1.462 M/sec
1.901400808 seconds time elapsed
RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
29,396,174.807 CPU ticks per iteration
48627.618792 task-clock # 2.183 CPUs utilized
141,749,504,525 cycles # 2.915 GHz
78,008,558,700 instructions # 0.55 insns per cycle
38,536,844 raw_syscalls:sys_enter # 0.792 M/sec
22.271697343 seconds time elapsed
100 iterations:
RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex":
67,621,168.46 CPU ticks per iteration
4326.998212 task-clock # 0.859 CPUs utilized
11,239,050,634 cycles # 2.597 GHz
8,415,799,134 instructions # 0.75 insns per cycle
2,965,384 raw_syscalls:sys_enter # 0.685 M/sec
5.036652093 seconds time elapsed
RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex":
70,621,368.59 CPU ticks per iteration
4909.514006 task-clock # 0.934 CPUs utilized
13,123,468,429 cycles # 2.673 GHz
9,532,793,349 instructions # 0.73 insns per cycle
3,619,607 raw_syscalls:sys_enter # 0.737 M/sec
5.253921952 seconds time elapsed
RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes":
67,478,669.37 CPU ticks per iteration
4314.232114 task-clock # 0.857 CPUs utilized
11,244,572,017 cycles # 2.606 GHz
8,382,057,867 instructions # 0.75 insns per cycle
2,939,351 raw_syscalls:sys_enter # 0.681 M/sec
5.035212837 seconds time elapsed
RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes":
70,837,078.76 CPU ticks per iteration
4933.702732 task-clock # 0.929 CPUs utilized
13,192,133,179 cycles # 2.674 GHz
9,554,807,698 instructions # 0.72 insns per cycle
3,622,623 raw_syscalls:sys_enter # 0.734 M/sec
5.309986829 seconds time elapsed
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120821/1da97fe8/attachment.sig>
More information about the Development
mailing list