[Development] Mutex future directions

Olivier Goffart olivier at woboq.com
Sat May 19 00:36:56 CEST 2012


On Friday 18 May 2012 23:25:47 Thiago Macieira wrote:
> > > QBasicMutex is an internal POD class that offers non-recursive locking.
> > > It's incredibly efficient for Linux, where the single pointer-sized
> > > member variable is enough to execute the futex operations, in all cases.
> > > For other platforms, we get the same efficiency in non-contended cases,
> > > but incur a non-negligible performance penalty when contention happens.
> > 
> > Which non-negligible performance penalty are you takling about here?
> 
> The need to allocate memory and do the test-and-set loop for setting the
> private.

There is no memory allocation, it is using the QFreeList.
And this happens when we are about to do a context switch. So I beleive the 
test-and-set loop should be neglectible.

> Right, and users use the classes as intended... :-P

(QBasicMutex is an undocumented internal class)

> Introducing the noop valgrind code (a 32-bit rotate) still consumes CPU
> resources. It will consume front-end decoding, one ALU port, the retirement,
> not to mention increased code size. There's no free lunch.

Slower than a function call (which will likely spill register)?  I don't 
beleive so.

(moreever, we are talking about debug build, right?)

> > That is not relevant.
> > Transactional memory requires a different set of primitives. QMutex has
> > nothing to do with transactional memory.
> 
> Then you didn't read the manual. I'm going to ignore what you said until you
> read the manual because transactional memory has everythig to do with
> QMutex.
> 
> Please, either believe me or read the manual.

Do you have a link to that manuel?

Transactional memory is about detecting conflicts between transactions, and 
rolling back.
Mutexes are about locking until the resource is free

Transactional memory could be used to simplify the code that allocate the 
QMutexPrivate.

But I doubt the hypothetical gain is even worth the function call.

Because remember the uncontended case is much more critical than the contended 
case. (There is no point in winning a dozens of cycles if we are going to pay 
anyway several hendreds in the context switch that follow)

But fast uncontended case is critical because it allows the use of mutex in 
part that might or not be used with threads. Example: QObject. We need to put 
mutexes while emiting a signal because maybe it is used in multiple thread. 
But we don't want to pay too much when used in a single thread (the common 
case)

> > > (P2) optimise the Mac and Windows implementations to avoid the need for
> > > allocating a dynamic d pointer in case of contention. In fact, remove
> > > the
> > > need for dynamic d-pointer allocation altogether: Mac, Windows and Linux
> > > should never do it, while the generic Unix implementation should do it
> > > all
> > > the time in the constructor
> > 
> > The allocation of the d-ponter is not that dynamic.
> > It is already quite optimized in Qt5
> 
> My suggestion was to avoid the QMutexPrivate allocation on Mac and Windows,
> since they require just a bit of initialisation. Now, given what you said
> about semaphore_t, we may not be able to do it on the Mac. But we can try to
> apply the same optimisation for Windows -- the initialisation is a call to
> CreateEvent.

But CreateEvent still probably allocate memory behind.

> > I beleive there is more important priority right now than touching QMutex,
> > which already had its lifting for Qt5.
> 
> I disagree. If we shoot ourselves in the foot by not being able to support
> TSX and valgrind in Qt 5, we've lost a lot. That's why I propose
> de-inlining for 5.0, so we have enough time to investigate those potential
> drawbacks.

I think the inline is not a problem for valgrind. 

And I don't think we can gain much with TSX. 

And even if we could, we still do pretty much everything in a binary 
compatible way despite the inlines (We can say 2 means unlocked and 3 locked, 
then the old code fallbacks to the non-inline case (The first lock would 
handle the 0 or 1))

Is it not however shooting ourself in the feet not to inline it? Because we 
hardly can inline it later in a binary compatible way.

-- 
Olivier

Woboq - Qt services and support - http://woboq.com



More information about the Development mailing list