[Development] Qt6: Adding UTF-8 storage support to QString

Fri Jan 18 21:54:21 CET 2019

On Friday, 18 January 2019 08:57:19 PST Tor Arne Vestbø wrote:
> > On 18 Jan 2019, at 17:21, Thiago Macieira <thiago.macieira at intel.com>
> > Actually, what we should do is allow everywhere
> > 
> > 	functionTakingString(u"Tor Arne Vestbø")
> > 	// (note the u)
> 
> Yes, this would be awesome! Please let’s do this 😊
> 
> And I guess without QT_NO_CAST_FROM_ASCII you’d still be able to do:
> 
>   functionTakingString("Tor Arne Vestbø”) // without the ‘u’, runtime cost

Right, but given the benefit of char16_t literals, we should encourage the 
QT_NO_CAST_FROM_ASCII even more! It's a single extra letter in your source and 
even if the compiler is misconfigured and is producing mojibake for your 
surname, my middle name or Jędrzej's first name, it will still work for US-
ASCII content ("a broken clock is right twice a day" type of "work").

In fact, we ought to look into replacing our QLatin1String content with 
char16_t literals in our sources. Pros: avoid the Latin1 decoder, which is 
slower[¹] than a pure memcpy. Cons: doubles the size of the string. So I'd use 
QLatin1String only for uncommonly used strings, where saving a few bytes is 
worth it.

[¹] see https://analysis.godbolt.org/z/OZ-5Gz, which contains the inner loop 
of qt_from_latin1_internal (an AVX2 build[²]) and compare to an equivalent 
memcpy in https://analysis.godbolt.org/z/7vR2jW. Note how the memcpy loop 
according to llvm-mca has 3 cycles fewer of latency than the latin1 decoder. 
And this is not an optimal memcpy loop.

[²] Our builds are not AVX2 by default. You're only going to get this 
performance if you build with -march=native (Gentoo?) or you use Clear Linux. 
The defaults are much worse.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center