[Interest] Heavily Commented Example: Simple Single Frontend with Two Backends
thiago.macieira at intel.com
Tue Oct 23 20:24:29 CEST 2012
On terça-feira, 23 de outubro de 2012 12.14.37, Till Oliver Knoll wrote:
> - Use "QAtomicInt" instead of a "volatile bool" (for the simple "Stop
> thread" use case) or
The problems with volatile bool:
1) volatile wasn't designed for threading.
It was designed for memory-mapped I/O. Its purpose is to make sure that there
are no more and no fewer reads from the variable and writes to it than what
the code does. If I write:
a = 1;
a = 1;
I want the compiler to store 1 twice. If this is MMIO, then I might need that
value of 0x01 sent twice over my I/O device.
For threading, however, that's irrelevant. Storing the same value twice,
especially sequentially like that, makes no sense. I won't bother explaining
why because you can see it with little thought.
What's more, CPU architectures don't work like that either. Writes are cached
and then sent to the main RAM and other CPUs later, in bursts. Writing twice
to memory, especially sequentially, will almost certainly result in RAM being
written to only once. And besides, there's no way to detect that a location in
memory has been overwritten with the same value.
For those reasons, the semantics of volatile don't match the needs of
2) volatile isn't atomic.
a) for all types
All CPU architectures I know of have at least one size that they can read and
write in a single operation. It's the machine word, which usually corresponds
to the register size.
Complex and modern CPUs are often able to read and write data types of
different sizes in atomic operations, but there are many examples of CPUs that
can't do it. The only way to store an 8-bit value is to load the entire word
where that 8-bit value is located, merge it in and then store the full word. A
read-modify-write sequence is definitely not an atomic store.
The C++ bool type is 1 byte in size, so it suffers from this problem. So here
we have a conclusion: you'd never use volatile bool, you'd use volatile
sig_atomic_t (a type that is required by POSIX to have atomic loads and
b) for all operations
Even if you follow the POSIX recommendations and use a sig_atomic_t for your
variable, most other operations aren't atomic. On most architectures,
incrementing and decrementing isn't atomic. And if you're trying to do thread
synchronisation, you often need higher operations like fetch-and-add, compare-
and-swap or simple swap.
3) volatile does not (usually) generate memory barriers
There are two types of memory barriers: compiler and processor ones. Take the
value = 123456;
spinlock = 0;
Where spinlock is a volatile int. Two levels of things might go wrong there:
first, since there's no compiler barrier, the compiler might generate code that
stores the 0 to the spinlock (unlocking it) before it generates the code that
saves the more complex value to the other variable.
I'm not even talking hypotheticals or obscure architectures. This is what the
ARMv7 compiler generated for me:
movw r1, #57920
mov r0, #0
movt r1, 1
str r0, [r2, #0]
str r1, [r3, #0]
This example was intentional because I knew that ARM can't load a large value
to a register in a single instruction. Loading 123456 requires two
instructions (move and move top). So I expected the compiler to schedule the
saving of 0 to before the saving of the more complex value and it did.
And even when it does schedule things in the correct order, the memory barrier
might be missing. Taking again the example of ARMv7, saving a zero to "value"
and unlocking the mutex:
mov r1, #0
str r1, [r2, #0]
str r1, [r3, #0]
The ARMv7 architecture, unlike x86, *does* allow the processor to write to
main RAM in any order. That means another core could see the the spinlock
being unlocked *before* the new value is stored, even if the compiler
generated the proper instructions. It's missing the memory barrier
The Qt 4 QAtomicInt API does not offer a load-acquire or a store-release
operation. All reads and writes are non-atomic and may be problematic -- you
can work around that by using a fetch-and-add of zero for load or a fetch-and-
store for store.
The Qt 5 API does offer the right functions and even requires you to think
The reason I said "usually" is because there is one architecture whose ABI
requires acquire semantics for volatile loads and release semantics for
volatile stores. That's IA-64, an architecture that was introduced after
multithreading became mainstream and has a specific "load acquire" instruction
anyway. The IA-64 manual explaining the memory ordering and barriers is one of
the references I use to study the subject.
4) compilers have bugs
In this case, there's little we can do but work around them. This problem was
found by the kernel developers in GCC. They had a structure like:
volatile int field2;
On a 64-bit architecture, to modify "field1", the compiler generated a full
read-modify-write of the full 64-bit word, including the overwriting of the
volatile field. In other words, the compiler was clearly violating the volatile
specs, since it generated a write to a volatile that didn't exist in the
In this particular case, QAtomicInt wouldn't protect you.
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 190 bytes
Desc: This is a digitally signed message part.
More information about the Interest