[Interest] Heavily Commented Example: Simple Single Frontend with Two BackendsHi,

Wed Oct 24 10:21:36 CEST 2012

Hi,

thanks a lot Thiago for expanding on this, especially on those areas I didn't 
have good knowledge about. I especially didn't know that all loads and stores 
on x86 are fully ordered, reading something about a "store buffer" made me 
believe otherwise.
Quite interesting topic, this.

Thanks,
Thomas

On Wednesday 24 October 2012 00:11:37 Thiago Macieira wrote:
> On terça-feira, 23 de outubro de 2012 16.14.11, K. Frank wrote:
> > > volatile only forces the compiler to create an instruction to fetch the
> > > variable from memory again, to prevent caching in a register. The CPU
> > > doesn't even know about the volatile keyword anymore, it just sees a
> > > normal fetch instruction, and can therefore use the CPU cache.
> > 
> > If I understand the original purpose of volatile, to confirm, reading a
> > volatile should cause a true memory fetch, not just a fetch from
> > cache.  (As you mention below, if volatile is used for memory-mapped
> > IO, that IO won't actually occur if the fetch is just a cache fetch.)
> 
> No, that's not what it means.
> 
> A volatile load requires the compiler to *load* again, but it doesn't
> instruct the processor how to load it. The processor may serve the load
> from any cache level or from main memory.
> 
> Unless the ABI says otherwise. The only ABI I know that says otherwise is
> IA-64's, that requires a volatile load to be done using the "load acquire"
> instruction.
> 
> What you're missing is that MMIO requires the memory address to be
> uncacheable. That means the processor will bypass all cache levels and will
> just issue the right load in the memory bus. But all of that is outside the
> compiler's control. It simply loads from an address you gave it.
> 
> > > Therefore, if your two threads are living on different CPUs, one CPU
> > > might not see the update on the other CPU, since the CPU caches are
> > > not updated. volatile does not help with that, you need proper memory
> > > barriers.
> > 
> > Let's say that CPU A writes to the volatile and CPU B reads from
> > it.  Isn't it the case that A's write to the volatile must cause a true
> > memory store and not just a write to cache?  (Again, memory-mapped
> > IO would not work if the store is just a write to cache.)
> > 
> > Then when CPU B reads the volatile, mustn't it perform an actual
> > memory fetch, picking up the result of A's memory store?
> 
> That depends on the architecture.
> 
> If the write to memory had release semantics and the read had acquire
> semantics, then the two CPUs must -- somehow -- figure out and synchronise.
> For example, on IA-64, the store-release causes CPU A to mark the address
> as modified in the L3 off-die cache and the load-acquire from CPU B
> requires it to go check the L3 cache.
> 
> On x86, the store from CPU A causes it to go and invalidate all cachelines
> containing that address in the other CPUs' caches. So when CPU B tries to
> read, it will be forced to go to main memory or the L3 off-die cache.
> 
> On those two architectures, a "volatile" qualifier is enough to ensure
> proper behaviour. On x86, because all loads and stores are fully ordered
> anyway and on IA-64, because the ABI requires volatile loads to acquire
> and volatile stores to release.
> 
> But if you go beyond those two Intel architectures, the bets are off. On
> ARMv7, for example, the ABI does not require a volatile load or store to
> insert the "dmb" instruction. That means in your example, CPU B would not
> read from the main memory and it could fetch the value from one of its
> stale caches. The same goes for PowerPC/POWER, MIPS, Sparc, etc.
> 
> > Let me state for the record that I do not use volatiles for thread
> > synchronization.  But the issue at hand is not whether a volatile
> > can be used for full-featured thread synchronization, but whether
> > it can be used by one thread to signal a second looping thread
> > to quit.
> 
> It can. In that restricted scenario, even a non-atomic write would be
> sufficient.
> 
> > It seems to me that volatile must cause the signalling
> > thread to perform an actual memory store and the thread to be
> > signalled to perform an actual memory fetch.  Yes, there is a
> > potential race condition in that the looping thread may read the
> > volatile after the signalling thread has written to it, but because
> > the thread is looping, it will come around and read the volatile
> > again, picking up the new value that signals it to quit.
> 
> Correct.
> 
> > Is there any plausible reading of the standard by which this
> > signalling mechanism could fail?  Do you know of any mainstream
> > platforms (hardware, os, and compiler) on which this signalling
> > mechanism does fail?
> 
> No, I can't think of any for this particular case.

-- 
** Qt Developer Conference: http://qtconference.kdab.com/ **

Thomas McGuire | thomas.mcguire at kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel. Germany +49-30-521325470, Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-independent software solutions
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3637 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20121024/03f5fd13/attachment.bin>