[Development] Mutex future directions

Thiago Macieira thiago.macieira at intel.com
Sun May 20 11:11:04 CEST 2012


On domingo, 20 de maio de 2012 02.17.53, Olivier Goffart wrote:
> On Saturday 19 May 2012 19:05:58 Thiago Macieira wrote:
> > On sábado, 19 de maio de 2012 18.34.30, Olivier Goffart wrote:
> > > Hi,
> > > 
> > > Regarding valgrind:
> > >  *) On debug build, nothing is inlined.
> > >  *) If we keep it inline, then we would just need a patch like this [1]
> > 
> > -fno-inline doesn't help because of -fvisibility-inlines-hidden. The call
> > cannot be rerouted to valgrind.
> 
> Visibility does not really matter for valgrind. It does address redirection,
> using the debug symbols.

I see...

After playing with an application under helgrind, in the debugger, I see it 
doesn't actually use ELF symbol interposition or even my second option of 
inserting jumps. Since valgrind is a CPU interpreter, it simply knows when 
you've reached the beginning of an intercepted function and transfers control 
to the interceptor.

Provided it knows that the same function can exist in multiple libraries, it's 
fine.

Anyway, we still need to approach the valgrind community and settle the 
question.

> > The annotation you added might help, but as I said, adding instructions --
> > even if they produce no architectural change -- still consumes CPU
> > resources. I'd like to benchmark the annotation vs the function call.
> 
> Yes, they have a cost which I am not sure we want to pay on release build.

But since we may want to helgrind release builds...

> > Indeed, but note that what it says about transactions that abort too
> > often.
> > If the transaction aborts, then the code needs to be re-run
> > non-transactionally, with the lock. That means decreased performance and
> > increased power consumption.
> 
> Yes, but we are talking about the rare case in which a QMutex is shared
> between two different objects compiled with different version of Qt.
> And in that unlikely case, one can just recompile to fix the performance
> issue.

That's not what I meant. I meant that, if we were to add the XACQUIRE and 
XRELEASE prefixes to all mutexes, we might end up with worse performance of 5.0 
and 5.1 applications when run on Haswell because we've never tested it.

Then again, I am asking for a slightly decreased performance for all 
situations.

> Indeed, QMutex can be used for all sort of cases.  There can be also way too
> much code in the critical section to fit into the transaction cache. Or
> maybe there is side effects.
> 
> QMutexLocker lock(&mutex)
> qDebug() << "What now?  does it also restart the transaction?"

Yes, a SYSENTER will definitely cause a transaction abort.

Which is why it might be a good idea to use RTM instead of HLE in QMutex, so 
we know which mutexes abort and we don't try again next time.

> So it is probably bad to do the lock elision within QMutex...
> We need to test it on real hardware to see if it works.
> 
> But my point is that the current QMutex architecture does not keep us from
> using lock elision later.

Mixing different builds of QMutex might be even worse. If a lock is acquired 
with HLE and released without, the transaction will keep running until it 
aborts. And I have no clue what happens if you XRELEASE when no transaction is 
running. It will definitely cause trouble if we use RTM.

Anyway, it might be something we can fix for 5.2, but are we prepared to take 
the chance?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120520/4aa3b548/attachment.sig>


More information about the Development mailing list