[Interest] Heavily Commented Example: Simple Single Frontend with Two BackendsHi,

Wed Oct 24 00:11:37 CEST 2012

On terça-feira, 23 de outubro de 2012 16.14.11, K. Frank wrote:
> > volatile only forces the compiler to create an instruction to fetch the
> > variable from memory again, to prevent caching in a register. The CPU
> > doesn't even know about the volatile keyword anymore, it just sees a
> > normal fetch instruction, and can therefore use the CPU cache.
> 
> If I understand the original purpose of volatile, to confirm, reading a
> volatile should cause a true memory fetch, not just a fetch from
> cache.  (As you mention below, if volatile is used for memory-mapped
> IO, that IO won't actually occur if the fetch is just a cache fetch.)

No, that's not what it means.

A volatile load requires the compiler to *load* again, but it doesn't instruct 
the processor how to load it. The processor may serve the load from any cache 
level or from main memory.

Unless the ABI says otherwise. The only ABI I know that says otherwise is 
IA-64's, that requires a volatile load to be done using the "load acquire" 
instruction.

What you're missing is that MMIO requires the memory address to be 
uncacheable. That means the processor will bypass all cache levels and will 
just issue the right load in the memory bus. But all of that is outside the 
compiler's control. It simply loads from an address you gave it.

> > Therefore, if your two threads are living on different CPUs, one CPU might
> > not see the update on the other CPU, since the CPU caches are not
> > updated. volatile does not help with that, you need proper memory
> > barriers.
> Let's say that CPU A writes to the volatile and CPU B reads from
> it.  Isn't it the case that A's write to the volatile must cause a true
> memory store and not just a write to cache?  (Again, memory-mapped
> IO would not work if the store is just a write to cache.)
> 
> Then when CPU B reads the volatile, mustn't it perform an actual
> memory fetch, picking up the result of A's memory store?

That depends on the architecture.

If the write to memory had release semantics and the read had acquire 
semantics, then the two CPUs must -- somehow -- figure out and synchronise. For 
example, on IA-64, the store-release causes CPU A to mark the address as 
modified in the L3 off-die cache and the load-acquire from CPU B requires it to 
go check the L3 cache.

On x86, the store from CPU A causes it to go and invalidate all cachelines 
containing that address in the other CPUs' caches. So when CPU B tries to 
read, it will be forced to go to main memory or the L3 off-die cache.

On those two architectures, a "volatile" qualifier is enough to ensure proper 
behaviour. On x86, because all loads and stores are fully ordered anyway and 
on IA-64, because the ABI requires volatile loads to acquire and volatile 
stores to release.

But if you go beyond those two Intel architectures, the bets are off. On ARMv7, 
for example, the ABI does not require a volatile load or store to insert the 
"dmb" instruction. That means in your example, CPU B would not read from the 
main memory and it could fetch the value from one of its stale caches. The 
same goes for PowerPC/POWER, MIPS, Sparc, etc.

> Let me state for the record that I do not use volatiles for thread
> synchronization.  But the issue at hand is not whether a volatile
> can be used for full-featured thread synchronization, but whether
> it can be used by one thread to signal a second looping thread
> to quit.

It can. In that restricted scenario, even a non-atomic write would be 
sufficient.

> It seems to me that volatile must cause the signalling
> thread to perform an actual memory store and the thread to be
> signalled to perform an actual memory fetch.  Yes, there is a
> potential race condition in that the looping thread may read the
> volatile after the signalling thread has written to it, but because
> the thread is looping, it will come around and read the volatile
> again, picking up the new value that signals it to quit.

Correct.

> Is there any plausible reading of the standard by which this
> signalling mechanism could fail?  Do you know of any mainstream
> platforms (hardware, os, and compiler) on which this signalling
> mechanism does fail?

No, I can't think of any for this particular case.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20121023/c2a27137/attachment.sig>