[Development] Why can't QString use UTF-8 internally?
Thiago Macieira
thiago.macieira at intel.com
Wed Feb 11 01:41:12 CET 2015
On Wednesday 11 February 2015 00:40:28 Marc Mutz wrote:
> On Tuesday 10 February 2015 22:26:50 Thiago Macieira wrote:
> > It's not insurmountable. I can think of two solutions:
> > 1) pre-allocate enough space for the UTF-16 data (strlen(utf8) * 2), so
> >
> > that the const functions can implicitly write to the UTF-16 block when
> > needed. Since the original UTF-8 data is constant and if there are no
> > out-of-thin-air values, multiple threads could do this operation
> > simultaneously safely.
>
> No, they can't. The writes conflict and neither happens-before the other ->
> data race -> UB.
There is a happens-before: the writing of the Latin1 or UTF-8 data happens-
before either thread converting it to UTF-16.
Indeed, the two writes are unsynchronised with relation to each other. But all
of the writes write the same data to the same memory positions, so ordering is
irrelevant.
The C++ standard may say this is UB, but CPU architectures say this is
perfectly well-defined.
> Your #2 is sound, of course, as long as readers loadAcquire the utf16 data
> pointer.
Right. However, this atomic load and the need to verify the state before any
operation is probably a deal-breaker.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list