[Development] char8_t summary?
thiago.macieira at intel.com
Sat Jul 13 13:41:21 CEST 2019
On Friday, 12 July 2019 17:37:59 -03 Matthew Woehlke wrote:
> That said, I took a look at startsWith, and... surprise! It is *already
> a template*. So at least in that case, it isn't obvious why adding more
> combinations would be so terribly onerous.
Again, note how the template implicitly assumes things. A 3-character string
cannot be present at the beginning (startsWith), end (endsWith) or anywhere in
the middle (contains, indexOf, lastIndexOf) of a 2-character one, for example.
But a 2- and 3-byte UTF-8 string can be the prefix of a 1-character UTF-16
string and a 4-byte UTF-8 string can be the prefix of a 2-codeunit UTF-16 (1
character). That means implementing UTF-8 functions requires different
algorithms in the first place. That means templates are not usually the
I'm not saying impossible. You can, by writing sufficiently generic algorithms
that scan the strings in lockstep (you can scan UTF-8 backwards, after all).
But the reason you don't *want* to is that our Latin1 and UTF-16 algorithms
are optimised, often vectorised, for their purpose. We don't want to lose the
efficiency we've already got.
And I'm not saying we shouldn't have UTF-8 algorithms or even a
QUtf8StringView or some such. It would have helped in CBOR, for example, see
void appendTextString(const char *utf8, qsizetype len);
This is one that should at least get the overload.
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development