[Development] How qAsConst and qExchange lead to qNN
Thiago Macieira
thiago.macieira at intel.com
Tue Nov 15 17:25:27 CET 2022
On Tuesday, 15 November 2022 00:52:24 PST Marc Mutz via Development wrote:
> That remains to be proven. A rule of thumb for atomics is that they're
> two orders of magnitude slower than a normal int. They also still act as
> optimizer firewalls. With that rule of thumb, copying 50 char16_t's is
> faster than one ref-count update. What really is the deciding point is
> whether or not there's a memory allocation involved. I mentioned that
> for many use-cases, therefore, a non-CoW SBO container is preferable over a
> CoW non-SBO one.
That's irrelevant so long as we don't have SBO containers.
So what we need to really compare are memory allocations versus the atomics. A
locked operation on a cacheline on x86 will take in the order of 20 cycles of
latency on top of any memory delays[1], but do note the CPU keeps running
meanwhile (read: an atomic inc has a much smaller impact than an atomic dec
that uses the result). A memory allocation for a single byte will have an
impact bigger than this, hundreds of cycles.
Therefore, in the case of CoW versus deep copy, CoW always wins.
[1] https://uops.info/html-instr/INC_LOCK_M32.html says 23 cycles on an 11-
year-old Sandy Bridge, 19 on Haswell, 18 on everything since Skylake.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Cloud Software Architect - Intel DCAI Cloud Engineering
More information about the Development
mailing list