[Interest] Heavily Commented Example: Simple Single Frontend with Two Backends

Tue Oct 23 20:24:29 CEST 2012

On terça-feira, 23 de outubro de 2012 12.14.37, Till Oliver Knoll wrote:
> - Use "QAtomicInt" instead of a "volatile bool" (for the simple "Stop
> thread" use case) or

The problems with volatile bool:

1) volatile wasn't designed for threading. 

It was designed for memory-mapped I/O. Its purpose is to make sure that there 
are no more and no fewer reads from the variable and writes to it than what 
the code does. If I write:
	a = 1;
	a = 1;
I want the compiler to store 1 twice. If this is MMIO, then I might need that 
value of 0x01 sent twice over my I/O device.

For threading, however, that's irrelevant. Storing the same value twice, 
especially sequentially like that, makes no sense. I won't bother explaining 
why because you can see it with little thought.

What's more, CPU architectures don't work like that either. Writes are cached 
and then sent to the main RAM and other CPUs later, in bursts. Writing twice 
to memory, especially sequentially, will almost certainly result in RAM being 
written to only once. And besides, there's no way to detect that a location in 
memory has been overwritten with the same value.

For those reasons, the semantics of volatile don't match the needs of 
threading.

2) volatile isn't atomic.
 a) for all types

All CPU architectures I know of have at least one size that they can read and 
write in a single operation. It's the machine word, which usually corresponds 
to the register size.

Complex and modern CPUs are often able to read and write data types of 
different sizes in atomic operations, but there are many examples of CPUs that 
can't do it. The only way to store an 8-bit value is to load the entire word 
where that 8-bit value is located, merge it in and then store the full word. A 
read-modify-write sequence is definitely not an atomic store.

The C++ bool type is 1 byte in size, so it suffers from this problem. So here 
we have a conclusion: you'd never use volatile bool, you'd use volatile 
sig_atomic_t (a type that is required by POSIX to have atomic loads and 
stores).

 b) for all operations

Even if you follow the POSIX recommendations and use a sig_atomic_t for your 
variable, most other operations aren't atomic. On most architectures, 
incrementing and decrementing isn't atomic. And if you're trying to do thread 
synchronisation, you often need higher operations like fetch-and-add, compare-
and-swap or simple swap.

3) volatile does not (usually) generate memory barriers

There are two types of memory barriers: compiler and processor ones. Take the 
following code:

	value = 123456;
	spinlock = 0;

Where spinlock is a volatile int. Two levels of things might go wrong there: 
first, since there's no compiler barrier, the compiler might generate code that 
stores the 0 to the spinlock (unlocking it) before it generates the code that 
saves the more complex value to the other variable. 

I'm not even talking hypotheticals or obscure architectures. This is what the 
ARMv7 compiler generated for me:

        movw    r1, #57920
        mov     r0, #0
        movt    r1, 1
        str     r0, [r2, #0]
        str     r1, [r3, #0]

This example was intentional because I knew that ARM can't load a large value 
to a register in a single instruction. Loading 123456 requires two 
instructions (move and move top). So I expected the compiler to schedule the 
saving of 0 to before the saving of the more complex value and it did.

And even when it does schedule things in the correct order, the memory barrier 
might be missing. Taking again the example of ARMv7, saving a zero to "value" 
and unlocking the mutex:

        mov     r1, #0
        str     r1, [r2, #0]
        str     r1, [r3, #0]

The ARMv7 architecture, unlike x86, *does* allow the processor to write to 
main RAM in any order. That means another core could see the the spinlock 
being unlocked *before* the new value is stored, even if the compiler 
generated the proper instructions. It's missing the memory barrier 
instruction.

The Qt 4 QAtomicInt API does not offer a load-acquire or a store-release 
operation. All reads and writes are non-atomic and may be problematic -- you 
can work around that by using a fetch-and-add of zero for load or a fetch-and-
store for store. 

The Qt 5 API does offer the right functions and even requires you to think 
about it.

The reason I said "usually" is because there is one architecture whose ABI 
requires acquire semantics for volatile loads and release semantics for 
volatile stores. That's IA-64, an architecture that was introduced after 
multithreading became mainstream and has a specific "load acquire" instruction 
anyway. The IA-64 manual explaining the memory ordering and barriers is one of 
the references I use to study the subject.

4) compilers have bugs

In this case, there's little we can do but work around them. This problem was 
found by the kernel developers in GCC. They had a structure like:

	int field1;
	volatile int field2;

On a 64-bit architecture, to modify "field1", the compiler generated a full 
read-modify-write of the full 64-bit word, including the overwriting of the 
volatile field. In other words, the compiler was clearly violating the volatile 
specs, since it generated a write to a volatile that didn't exist in the 
source code.

In this particular case, QAtomicInt wouldn't protect you.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20121023/6be87795/attachment.sig>