[Development] Mutex future directions

lars.knoll at nokia.com lars.knoll at nokia.com
Sat May 19 09:30:56 CEST 2012

A few comments from my side:

* I am not a big fan of inlining the public classes neither. It feels a
bit like over-optimizing at the wrong place. Reason is that this might
make it a impossible later on to refactor our code.
* It's ok to have some low level inline class for internal use if that
helps us in performance critical parts (such as signal/slot code). Nothing
says this and the public QMutex class have to be the same.
* IMO we should really try to allow using tools such as helgrind also with
release build
* it would be good to see how much of a real world (ie. not with
artificial benchmarks) difference you get due to inlining the mutex code.
Is it really relevant?


On 5/19/12 12:36 AM, "ext Olivier Goffart" <olivier at woboq.com> wrote:

>On Friday 18 May 2012 23:25:47 Thiago Macieira wrote:
>> > > QBasicMutex is an internal POD class that offers non-recursive
>> > > It's incredibly efficient for Linux, where the single pointer-sized
>> > > member variable is enough to execute the futex operations, in all
>> > > For other platforms, we get the same efficiency in non-contended
>> > > but incur a non-negligible performance penalty when contention
>> > 
>> > Which non-negligible performance penalty are you takling about here?
>> The need to allocate memory and do the test-and-set loop for setting the
>> private.
>There is no memory allocation, it is using the QFreeList.
>And this happens when we are about to do a context switch. So I beleive
>test-and-set loop should be neglectible.
>> Right, and users use the classes as intended... :-P
>(QBasicMutex is an undocumented internal class)
>> Introducing the noop valgrind code (a 32-bit rotate) still consumes CPU
>> resources. It will consume front-end decoding, one ALU port, the
>> not to mention increased code size. There's no free lunch.
>Slower than a function call (which will likely spill register)?  I don't
>beleive so.
>(moreever, we are talking about debug build, right?)
>> > That is not relevant.
>> > Transactional memory requires a different set of primitives. QMutex
>> > nothing to do with transactional memory.
>> Then you didn't read the manual. I'm going to ignore what you said
>>until you
>> read the manual because transactional memory has everythig to do with
>> QMutex.
>> Please, either believe me or read the manual.
>Do you have a link to that manuel?
>Transactional memory is about detecting conflicts between transactions,
>rolling back.
>Mutexes are about locking until the resource is free
>Transactional memory could be used to simplify the code that allocate the
>But I doubt the hypothetical gain is even worth the function call.
>Because remember the uncontended case is much more critical than the
>case. (There is no point in winning a dozens of cycles if we are going to
>anyway several hendreds in the context switch that follow)
>But fast uncontended case is critical because it allows the use of mutex
>part that might or not be used with threads. Example: QObject. We need to
>mutexes while emiting a signal because maybe it is used in multiple
>But we don't want to pay too much when used in a single thread (the
>> > > (P2) optimise the Mac and Windows implementations to avoid the need
>> > > allocating a dynamic d pointer in case of contention. In fact,
>> > > the
>> > > need for dynamic d-pointer allocation altogether: Mac, Windows and
>> > > should never do it, while the generic Unix implementation should do
>> > > all
>> > > the time in the constructor
>> > 
>> > The allocation of the d-ponter is not that dynamic.
>> > It is already quite optimized in Qt5
>> My suggestion was to avoid the QMutexPrivate allocation on Mac and
>> since they require just a bit of initialisation. Now, given what you
>> about semaphore_t, we may not be able to do it on the Mac. But we can
>>try to
>> apply the same optimisation for Windows -- the initialisation is a call
>> CreateEvent.
>But CreateEvent still probably allocate memory behind.
>> > I beleive there is more important priority right now than touching
>> > which already had its lifting for Qt5.
>> I disagree. If we shoot ourselves in the foot by not being able to
>> TSX and valgrind in Qt 5, we've lost a lot. That's why I propose
>> de-inlining for 5.0, so we have enough time to investigate those
>> drawbacks.
>I think the inline is not a problem for valgrind.
>And I don't think we can gain much with TSX.
>And even if we could, we still do pretty much everything in a binary
>compatible way despite the inlines (We can say 2 means unlocked and 3
>then the old code fallbacks to the non-inline case (The first lock would
>handle the 0 or 1))
>Is it not however shooting ourself in the feet not to inline it? Because
>hardly can inline it later in a binary compatible way.
>Woboq - Qt services and support - http://woboq.com
>Development mailing list
>Development at qt-project.org

More information about the Development mailing list