[Development] Why can't QString use UTF-8 internally?

Wed Feb 11 01:41:12 CET 2015

On Wednesday 11 February 2015 00:40:28 Marc Mutz wrote:
> On Tuesday 10 February 2015 22:26:50 Thiago Macieira wrote:
> > It's not insurmountable. I can think of two solutions:
> >  1) pre-allocate enough space for the UTF-16 data (strlen(utf8) * 2), so
> > 
> > that  the const functions can implicitly write to the UTF-16 block when
> > needed. Since the original UTF-8 data is constant and if there are no
> > out-of-thin-air values, multiple threads could do this operation
> > simultaneously safely.
> 
> No, they can't. The writes conflict and neither happens-before the other ->
> data race -> UB.

There is a happens-before: the writing of the Latin1 or UTF-8 data happens-
before either thread converting it to UTF-16.

Indeed, the two writes are unsynchronised with relation to each other. But all 
of the writes write the same data to the same memory positions, so ordering is 
irrelevant.

The C++ standard may say this is UB, but CPU architectures say this is 
perfectly well-defined.

> Your #2 is sound, of course, as long as readers loadAcquire the utf16 data
> pointer.

Right. However, this atomic load and the need to verify the state before any 
operation is probably a deal-breaker.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center