[Development] Mutex future directions
bradley.hughes at nokia.com
bradley.hughes at nokia.com
Tue May 22 09:12:26 CEST 2012
On May 21, 2012, at 2:58 PM, ext Thiago Macieira wrote:
> On segunda-feira, 21 de maio de 2012 08.34.32, bradley.hughes at nokia.com wrote:
>> On May 18, 2012, at 8:34 PM, ext Thiago Macieira wrote:
>>> Recommendations (priority):
>>>
>>> (P0) de-inline QBasicMutex locking functions until we solve some or all of
>>> the below problems
>>
>> I agree with this, so that it gives a chance to fix the performance
>> regressions on Mac at a later date (since it probably won't be fixed before
>> 5.0 is released).
>
> Some notes from the IRC discussion this morning between Olivier, Brad and
> myself:
>
> * QMutex contended performance has dropped considerably on Mac from 4.8 to
> 5.0 (it's 10x slower)
> * QMutex contended performance on Mac is now actually similar to the
> pthread_mutex_t performance (read: contended QMutex on 4.8 is 10x faster than
> pthread_mutex_t)
> * changing the QMutex implementation to use the generic Unix codepath on Mac
> makes it 2x slower
> * the non-Linux code in QBasicMutex::lockInternal is considered complex and
> hard to read by both Brad and myself
>
> Brad: could you please provide what is, to the best of your knowledge today,
> the combination of tricks that made 4.8 fast?
The trick was the adaptive spin, added and modified over a series of commits in 4.8. The biggest gain was on Mac, Linux performance didn't change noticibly, Windows did get a small gain too (as far as I recall).
> * QMutex de-inlining and the Mac performance issues are orthogonal.
> * QMutex "de-inlining" should be understood more correctly as: removing the
> testAndSet calls from the inline functions. The inline functions should remain
> inline.
> * The de-inlining is important for Valgrind (helgrind / DRD) to work
> properly, even in release mode
Lars and I had a conversation in the hallway about how QMutex performance on Windows. It's been a while since I last tested, but I recall that QMutex didn't out perform CRITICAL_SECTIONs. De-inlining is necessary so that we can make QMutex nothing more than a wrapper around CRITICAL_SECTION (since the latter performs better).
So far, we've got 3 votes for de-inlining: Thiago, Lars, and myself. For the few cases where inlining matters, we can inline inside Qt at those locations (QMetaObject::activate() would be the first place to check).
> Note that there's another trick that QMutex can apply under valgrind but
> QBasicMutex cannot: if the QMutex constructor initialises the d pointer to
> anything non-null and different from 3, the inlined testAndSet will fail, so
> valgrind can properly hijack the lock and unlock functions.
>
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
> Software Architect - Intel Open Source Technology Center
> Intel Sweden AB - Registration Number: 556189-6027
> Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development
--
Bradley T. Hughes
bradley.hughes at nokia.com
More information about the Development
mailing list