[Development] Mutex future directions

Tue May 22 09:12:26 CEST 2012

On May 21, 2012, at 2:58 PM, ext Thiago Macieira wrote:

> On segunda-feira, 21 de maio de 2012 08.34.32, bradley.hughes at nokia.com wrote:
>> On May 18, 2012, at 8:34 PM, ext Thiago Macieira wrote:
>>> Recommendations (priority):
>>> 
>>> (P0) de-inline QBasicMutex locking functions until we solve some or all of
>>> the below problems
>> 
>> I agree with this, so that it gives a chance to fix the performance
>> regressions on Mac at a later date (since it probably won't be fixed before
>> 5.0 is released).
> 
> Some notes from the IRC discussion this morning between Olivier, Brad and 
> myself:
> 
> * QMutex contended performance has dropped considerably on Mac from 4.8 to 
> 5.0 (it's 10x slower)
> * QMutex contended performance on Mac is now actually similar to the 
> pthread_mutex_t performance (read: contended QMutex on 4.8 is 10x faster than 
> pthread_mutex_t)
> * changing the QMutex implementation to use the generic Unix codepath on Mac 
> makes it 2x slower
> * the non-Linux code in QBasicMutex::lockInternal is considered complex and 
> hard to read by both Brad and myself
> 
> Brad: could you please provide what is, to the best of your knowledge today, 
> the combination of tricks that made 4.8 fast?

The trick was the adaptive spin, added and modified over a series of commits in 4.8. The biggest gain was on Mac, Linux performance didn't change noticibly, Windows did get a small gain too (as far as I recall).

> * QMutex de-inlining and the Mac performance issues are orthogonal.
> * QMutex "de-inlining" should be understood more correctly as: removing the 
> testAndSet calls from the inline functions. The inline functions should remain 
> inline.
> * The de-inlining is important for Valgrind (helgrind / DRD) to work 
> properly, even in release mode

Lars and I had a conversation in the hallway about how QMutex performance on Windows. It's been a while since I last tested, but I recall that QMutex didn't out perform CRITICAL_SECTIONs. De-inlining is necessary so that we can make QMutex nothing more than a wrapper around CRITICAL_SECTION (since the latter performs better).

So far, we've got 3 votes for de-inlining: Thiago, Lars, and myself. For the few cases where inlining matters, we can inline inside Qt at those locations (QMetaObject::activate() would be the first place to check).

> Note that there's another trick that QMutex can apply under valgrind but 
> QBasicMutex cannot: if the QMutex constructor initialises the d pointer to 
> anything non-null and different from 3, the inlined testAndSet will fail, so 
> valgrind can properly hijack the lock and unlock functions.
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel Open Source Technology Center
>     Intel Sweden AB - Registration Number: 556189-6027
>     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development

--
Bradley T. Hughes
bradley.hughes at nokia.com