[Development] How qAsConst and qExchange lead to qNN

Tue Nov 15 17:25:27 CET 2022

On Tuesday, 15 November 2022 00:52:24 PST Marc Mutz via Development wrote:
> That remains to be proven. A rule of thumb for atomics is that they're
> two orders of magnitude slower than a normal int. They also still act as
> optimizer firewalls. With that rule of thumb, copying 50 char16_t's is
> faster than one ref-count update. What really is the deciding point is
> whether or not there's a memory allocation involved. I mentioned that
> for many use-cases, therefore, a non-CoW SBO container is preferable over a
> CoW non-SBO one.

That's irrelevant so long as we don't have SBO containers.

So what we need to really compare are memory allocations versus the atomics. A 
locked operation on a cacheline on x86 will take in the order of 20 cycles of 
latency on top of any memory delays[1], but do note the CPU keeps running 
meanwhile (read: an atomic inc has a much smaller impact than an atomic dec 
that uses the result). A memory allocation for a single byte will have an 
impact bigger than this, hundreds of cycles.

Therefore, in the case of CoW versus deep copy, CoW always wins.

[1] https://uops.info/html-instr/INC_LOCK_M32.html says 23 cycles on an 11-
year-old Sandy Bridge, 19 on Haswell, 18 on everything since Skylake.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering