[Development] QMutex with pthread on Linux
lars.knoll at nokia.com
lars.knoll at nokia.com
Wed Aug 22 09:00:05 CEST 2012
On Aug 21, 2012, at 10:36 PM, ext Thiago Macieira <thiago.macieira at intel.com> wrote:
> Hello
>
> I've just done some benchmarking of QMutex on Linux, using the pthread
> implementation instead of the futex one.
>
> Conclusions first:
>
> QMutex is optimised for uncontended case. It does that by keeping the d
> pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. Changing
> from one value to another is extremely quick, requiring a simple atomic
> operation. QMutex when uncontended proves to be roughly 16% faster than
> pthread. This also shows in the benchmarks that use non-zero msleep: the mutex
> is mostly uncontended.
>
> That comes at a price, though: the performance drops considerably when
> contention happens.
>
> When contention happens at a low rate (the "msleep(0)" case), QMutex
> performance is similar to that of pthread, though slightly worse (up to 5%).
>
> When contention happens a lot, the performance is awful. I've measured
> anything from 100% slower to over 1000%.
Uhhh… is there a reason why it's that much slower than the pthread mutex? A factor of 2 to 10 slower is not really what we should have.
Cheers,
Lars
>
> Extrapolating these results to Mac and Windows, I expect QMutex performance in
> uncontended to be *much* better, but still lose horribly in the contended
> case.
>
> Conclusion: I'm glad I use Linux and that we have futex.
>
> DATA:
>
> Reference:
> Intel i7-2620M (SandyBridge)
> 2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz
> CPU in "performance" governor
> Linux 3.5.2
> glibc 2.15
> Fedora 17
> GCC 4.7.1, 64-bit mode
> QtCore linked with LTO
>
> All results are the best out of 6 runs, under realtime FIFO scheduling.
>
> Uncontended Mutex results (100 million iterations):
>
> RESULT : tst_QMutex::uncontendedNative():
> 60.5891925 CPU ticks per iteration
> 450.189192 task-clock # 0.999 CPUs utilized
> 1,511,489,291 cycles # 3.357 GHz
> 1,306,287,711 instructions # 0.86 insns per cycle
> 197 raw_syscalls:sys_enter # 0.438 K/sec
> 0.450477229 seconds time elapsed
>
> RESULT : tst_QMutex::uncontendedQMutex():
> 50.7105596 CPU ticks per iteration
> 379.784144 task-clock # 0.999 CPUs utilized
> 1,268,507,621 cycles # 3.340 GHz
> 745,975,928 instructions # 0.59 insns per cycle
> 194 raw_syscalls:sys_enter # 0.511 K/sec
> 0.380036271 seconds time elapsed
>
> Contended Mutex results (1000 iterations):
>
> RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex":
> 2,052,212.507 CPU ticks per iteration
> 5814.825257 task-clock # 3.797 CPUs utilized
> 18,513,286,444 cycles # 3.184 GHz
> 13,801,932,519 instructions # 0.75 insns per cycle
> 8,609,051 raw_syscalls:sys_enter # 1.481 M/sec
> 1.531495948 seconds time elapsed
>
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex":
> 4,087,893.432 CPU ticks per iteration
> 11037.507260 task-clock # 2.699 CPUs utilized
> 33,483,481,790 cycles # 3.034 GHz
> 21,436,137,659 instructions # 0.64 insns per cycle
> 12,012,804 raw_syscalls:sys_enter # 1.088 M/sec
> 4.088957193 seconds time elapsed
>
> Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.
>
> RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes":
> 2,550,929.603 CPU ticks per iteration
> 7155.513345 task-clock # 3.763 CPUs utilized
> 22,760,839,897 cycles # 3.181 GHz
> 16,370,712,299 instructions # 0.72 insns per cycle
> 10,457,934 raw_syscalls:sys_enter # 1.462 M/sec
> 1.901400808 seconds time elapsed
>
> RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes":
> 29,396,174.807 CPU ticks per iteration
> 48627.618792 task-clock # 2.183 CPUs utilized
> 141,749,504,525 cycles # 2.915 GHz
> 78,008,558,700 instructions # 0.55 insns per cycle
> 38,536,844 raw_syscalls:sys_enter # 0.792 M/sec
> 22.271697343 seconds time elapsed
>
> 100 iterations:
> RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex":
> 67,621,168.46 CPU ticks per iteration
> 4326.998212 task-clock # 0.859 CPUs utilized
> 11,239,050,634 cycles # 2.597 GHz
> 8,415,799,134 instructions # 0.75 insns per cycle
> 2,965,384 raw_syscalls:sys_enter # 0.685 M/sec
> 5.036652093 seconds time elapsed
>
> RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex":
> 70,621,368.59 CPU ticks per iteration
> 4909.514006 task-clock # 0.934 CPUs utilized
> 13,123,468,429 cycles # 2.673 GHz
> 9,532,793,349 instructions # 0.73 insns per cycle
> 3,619,607 raw_syscalls:sys_enter # 0.737 M/sec
> 5.253921952 seconds time elapsed
>
> RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes":
> 67,478,669.37 CPU ticks per iteration
> 4314.232114 task-clock # 0.857 CPUs utilized
> 11,244,572,017 cycles # 2.606 GHz
> 8,382,057,867 instructions # 0.75 insns per cycle
> 2,939,351 raw_syscalls:sys_enter # 0.681 M/sec
> 5.035212837 seconds time elapsed
>
> RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes":
> 70,837,078.76 CPU ticks per iteration
> 4933.702732 task-clock # 0.929 CPUs utilized
> 13,192,133,179 cycles # 2.674 GHz
> 9,554,807,698 instructions # 0.72 insns per cycle
> 3,622,623 raw_syscalls:sys_enter # 0.734 M/sec
> 5.309986829 seconds time elapsed
>
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
> Software Architect - Intel Open Source Technology Center
> Intel Sweden AB - Registration Number: 556189-6027
> Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development
More information about the Development
mailing list