[Interest] Heavily Commented Example: Simple Single Frontend with Two BackendsHi,
Thomas McGuire
thomas.mcguire at kdab.com
Wed Oct 24 10:21:36 CEST 2012
Hi,
thanks a lot Thiago for expanding on this, especially on those areas I didn't
have good knowledge about. I especially didn't know that all loads and stores
on x86 are fully ordered, reading something about a "store buffer" made me
believe otherwise.
Quite interesting topic, this.
Thanks,
Thomas
On Wednesday 24 October 2012 00:11:37 Thiago Macieira wrote:
> On terça-feira, 23 de outubro de 2012 16.14.11, K. Frank wrote:
> > > volatile only forces the compiler to create an instruction to fetch the
> > > variable from memory again, to prevent caching in a register. The CPU
> > > doesn't even know about the volatile keyword anymore, it just sees a
> > > normal fetch instruction, and can therefore use the CPU cache.
> >
> > If I understand the original purpose of volatile, to confirm, reading a
> > volatile should cause a true memory fetch, not just a fetch from
> > cache. (As you mention below, if volatile is used for memory-mapped
> > IO, that IO won't actually occur if the fetch is just a cache fetch.)
>
> No, that's not what it means.
>
> A volatile load requires the compiler to *load* again, but it doesn't
> instruct the processor how to load it. The processor may serve the load
> from any cache level or from main memory.
>
> Unless the ABI says otherwise. The only ABI I know that says otherwise is
> IA-64's, that requires a volatile load to be done using the "load acquire"
> instruction.
>
> What you're missing is that MMIO requires the memory address to be
> uncacheable. That means the processor will bypass all cache levels and will
> just issue the right load in the memory bus. But all of that is outside the
> compiler's control. It simply loads from an address you gave it.
>
> > > Therefore, if your two threads are living on different CPUs, one CPU
> > > might not see the update on the other CPU, since the CPU caches are
> > > not updated. volatile does not help with that, you need proper memory
> > > barriers.
> >
> > Let's say that CPU A writes to the volatile and CPU B reads from
> > it. Isn't it the case that A's write to the volatile must cause a true
> > memory store and not just a write to cache? (Again, memory-mapped
> > IO would not work if the store is just a write to cache.)
> >
> > Then when CPU B reads the volatile, mustn't it perform an actual
> > memory fetch, picking up the result of A's memory store?
>
> That depends on the architecture.
>
> If the write to memory had release semantics and the read had acquire
> semantics, then the two CPUs must -- somehow -- figure out and synchronise.
> For example, on IA-64, the store-release causes CPU A to mark the address
> as modified in the L3 off-die cache and the load-acquire from CPU B
> requires it to go check the L3 cache.
>
> On x86, the store from CPU A causes it to go and invalidate all cachelines
> containing that address in the other CPUs' caches. So when CPU B tries to
> read, it will be forced to go to main memory or the L3 off-die cache.
>
> On those two architectures, a "volatile" qualifier is enough to ensure
> proper behaviour. On x86, because all loads and stores are fully ordered
> anyway and on IA-64, because the ABI requires volatile loads to acquire
> and volatile stores to release.
>
> But if you go beyond those two Intel architectures, the bets are off. On
> ARMv7, for example, the ABI does not require a volatile load or store to
> insert the "dmb" instruction. That means in your example, CPU B would not
> read from the main memory and it could fetch the value from one of its
> stale caches. The same goes for PowerPC/POWER, MIPS, Sparc, etc.
>
> > Let me state for the record that I do not use volatiles for thread
> > synchronization. But the issue at hand is not whether a volatile
> > can be used for full-featured thread synchronization, but whether
> > it can be used by one thread to signal a second looping thread
> > to quit.
>
> It can. In that restricted scenario, even a non-atomic write would be
> sufficient.
>
> > It seems to me that volatile must cause the signalling
> > thread to perform an actual memory store and the thread to be
> > signalled to perform an actual memory fetch. Yes, there is a
> > potential race condition in that the looping thread may read the
> > volatile after the signalling thread has written to it, but because
> > the thread is looping, it will come around and read the volatile
> > again, picking up the new value that signals it to quit.
>
> Correct.
>
> > Is there any plausible reading of the standard by which this
> > signalling mechanism could fail? Do you know of any mainstream
> > platforms (hardware, os, and compiler) on which this signalling
> > mechanism does fail?
>
> No, I can't think of any for this particular case.
--
** Qt Developer Conference: http://qtconference.kdab.com/ **
Thomas McGuire | thomas.mcguire at kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel. Germany +49-30-521325470, Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-independent software solutions
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3637 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20121024/03f5fd13/attachment.bin>
More information about the Interest
mailing list