[Development] Qt6: Adding UTF-8 storage support to QString
Thiago Macieira
thiago.macieira at intel.com
Fri Jan 18 21:54:21 CET 2019
On Friday, 18 January 2019 08:57:19 PST Tor Arne Vestbø wrote:
> > On 18 Jan 2019, at 17:21, Thiago Macieira <thiago.macieira at intel.com>
> > Actually, what we should do is allow everywhere
> >
> > functionTakingString(u"Tor Arne Vestbø")
> > // (note the u)
>
> Yes, this would be awesome! Please let’s do this 😊
>
> And I guess without QT_NO_CAST_FROM_ASCII you’d still be able to do:
>
> functionTakingString("Tor Arne Vestbø”) // without the ‘u’, runtime cost
Right, but given the benefit of char16_t literals, we should encourage the
QT_NO_CAST_FROM_ASCII even more! It's a single extra letter in your source and
even if the compiler is misconfigured and is producing mojibake for your
surname, my middle name or Jędrzej's first name, it will still work for US-
ASCII content ("a broken clock is right twice a day" type of "work").
In fact, we ought to look into replacing our QLatin1String content with
char16_t literals in our sources. Pros: avoid the Latin1 decoder, which is
slower[¹] than a pure memcpy. Cons: doubles the size of the string. So I'd use
QLatin1String only for uncommonly used strings, where saving a few bytes is
worth it.
[¹] see https://analysis.godbolt.org/z/OZ-5Gz, which contains the inner loop
of qt_from_latin1_internal (an AVX2 build[²]) and compare to an equivalent
memcpy in https://analysis.godbolt.org/z/7vR2jW. Note how the memcpy loop
according to llvm-mca has 3 cycles fewer of latency than the latin1 decoder.
And this is not an optimal memcpy loop.
[²] Our builds are not AVX2 by default. You're only going to get this
performance if you build with -march=native (Gentoo?) or you use Clear Linux.
The defaults are much worse.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list