[Interest] Heavily Commented Example: Simple Single Frontend with Two Backends
serge.borovkov at gmail.com
Tue Oct 23 21:58:41 CEST 2012
I did some googling and found this
you are saying that it's ok to use volatile on x86_64 here since there is
no actual synchronization here. Or am I missing something here and cases
are different? Sorry for nitpicking, I am just trying to fully understand
what's happening in this case
On Tue, Oct 23, 2012 at 10:24 PM, Thiago Macieira <thiago.macieira at intel.com
> On terça-feira, 23 de outubro de 2012 12.14.37, Till Oliver Knoll wrote:
> > - Use "QAtomicInt" instead of a "volatile bool" (for the simple "Stop
> > thread" use case) or
> The problems with volatile bool:
> 1) volatile wasn't designed for threading.
> It was designed for memory-mapped I/O. Its purpose is to make sure that
> are no more and no fewer reads from the variable and writes to it than what
> the code does. If I write:
> a = 1;
> a = 1;
> I want the compiler to store 1 twice. If this is MMIO, then I might need
> value of 0x01 sent twice over my I/O device.
> For threading, however, that's irrelevant. Storing the same value twice,
> especially sequentially like that, makes no sense. I won't bother
> why because you can see it with little thought.
> What's more, CPU architectures don't work like that either. Writes are
> and then sent to the main RAM and other CPUs later, in bursts. Writing
> to memory, especially sequentially, will almost certainly result in RAM
> written to only once. And besides, there's no way to detect that a
> location in
> memory has been overwritten with the same value.
> For those reasons, the semantics of volatile don't match the needs of
> 2) volatile isn't atomic.
> a) for all types
> All CPU architectures I know of have at least one size that they can read
> write in a single operation. It's the machine word, which usually
> to the register size.
> Complex and modern CPUs are often able to read and write data types of
> different sizes in atomic operations, but there are many examples of CPUs
> can't do it. The only way to store an 8-bit value is to load the entire
> where that 8-bit value is located, merge it in and then store the full
> word. A
> read-modify-write sequence is definitely not an atomic store.
> The C++ bool type is 1 byte in size, so it suffers from this problem. So
> we have a conclusion: you'd never use volatile bool, you'd use volatile
> sig_atomic_t (a type that is required by POSIX to have atomic loads and
> b) for all operations
> Even if you follow the POSIX recommendations and use a sig_atomic_t for
> variable, most other operations aren't atomic. On most architectures,
> incrementing and decrementing isn't atomic. And if you're trying to do
> synchronisation, you often need higher operations like fetch-and-add,
> and-swap or simple swap.
> 3) volatile does not (usually) generate memory barriers
> There are two types of memory barriers: compiler and processor ones. Take
> following code:
> value = 123456;
> spinlock = 0;
> Where spinlock is a volatile int. Two levels of things might go wrong
> first, since there's no compiler barrier, the compiler might generate code
> stores the 0 to the spinlock (unlocking it) before it generates the code
> saves the more complex value to the other variable.
> I'm not even talking hypotheticals or obscure architectures. This is what
> ARMv7 compiler generated for me:
> movw r1, #57920
> mov r0, #0
> movt r1, 1
> str r0, [r2, #0]
> str r1, [r3, #0]
> This example was intentional because I knew that ARM can't load a large
> to a register in a single instruction. Loading 123456 requires two
> instructions (move and move top). So I expected the compiler to schedule
> saving of 0 to before the saving of the more complex value and it did.
> And even when it does schedule things in the correct order, the memory
> might be missing. Taking again the example of ARMv7, saving a zero to
> and unlocking the mutex:
> mov r1, #0
> str r1, [r2, #0]
> str r1, [r3, #0]
> The ARMv7 architecture, unlike x86, *does* allow the processor to write to
> main RAM in any order. That means another core could see the the spinlock
> being unlocked *before* the new value is stored, even if the compiler
> generated the proper instructions. It's missing the memory barrier
> The Qt 4 QAtomicInt API does not offer a load-acquire or a store-release
> operation. All reads and writes are non-atomic and may be problematic --
> can work around that by using a fetch-and-add of zero for load or a
> store for store.
> The Qt 5 API does offer the right functions and even requires you to think
> about it.
> The reason I said "usually" is because there is one architecture whose ABI
> requires acquire semantics for volatile loads and release semantics for
> volatile stores. That's IA-64, an architecture that was introduced after
> multithreading became mainstream and has a specific "load acquire"
> anyway. The IA-64 manual explaining the memory ordering and barriers is
> one of
> the references I use to study the subject.
> 4) compilers have bugs
> In this case, there's little we can do but work around them. This problem
> found by the kernel developers in GCC. They had a structure like:
> int field1;
> volatile int field2;
> On a 64-bit architecture, to modify "field1", the compiler generated a full
> read-modify-write of the full 64-bit word, including the overwriting of the
> volatile field. In other words, the compiler was clearly violating the
> specs, since it generated a write to a volatile that didn't exist in the
> source code.
> In this particular case, QAtomicInt wouldn't protect you.
> Thiago Macieira - thiago.macieira (AT) intel.com
> Software Architect - Intel Open Source Technology Center
> Interest mailing list
> Interest at qt-project.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Interest