[Development] Oslo, we have a problem</apollo 13> [char8_t]

Mon Jul 8 19:24:58 CEST 2019

On Monday, 8 July 2019 12:42:51 -03 Arnaud Clere wrote:
> -----Original Message-----
> From: Thiago Macieira <thiago.macieira at intel.com>
> 
> > I am not completely convinced of the benefit of adding of an owning UTF-8
> > string class, though I very much agree with a view over UTF-8 strings.
> > The reason is not the string class itself (alone it is definitely
> > useful), but the fact that it would muddy the waters as to what string
> > classes one should use in API. We might end up with some API using UTF-8
> > and some UTF-16.
> 
> Indeed, this is already the case : QJsonDocument::toJson() returns a
> QByteArray 

Which is the expected behaviour, as it returns something suitable for transfer 
over a socket, pipe to a process or to be saved in a file, like 
QCborValue::toCbor(), QDataStream, QTextStream (operating over a QByteArray), 
QXmlStreamWriter (operating over a QByteArray), QDomDocument::toByteArray(), 
etc.

It's just that, unlike those others, it is also a UTF-8 encoded text string. 
The XML ones, for example, can be configured to write under other encodings 
and such information is stored in the XML header. CBOR and QDataStream are 
obviously binary.

> on which users can conveniently call toUpper() until some data
> from the field makes them understand it does not work...

And there's little we can do to prevent that. Even if we removed 
QByteArray::toUpper and left it only in QLatin1String, people would still find 
ways to uppercase. That's the reason I would prefer to keep it, with well-
defined and locale-independent semantics.

> Working for a
> regulated industry, getting rid of potential bugs is my #1 concern, not
> that of having more fancy utf8 features! However, if deriving a QUtf8String
> from QByteArray is inappropriate (of which I am not totally convinced...
> cannot see a Liskov-Substitution-Principle violation in this case), I
> understand the task may be daunting. It may be argued too that COW is not
> interesting for such strings and APIs can be fixed by using u8string, but
> then, you ask Qt users to master both QString and std::string like APIs...

We don't ask users to use std::string APIs. That is not a text class, 
std::string is analogous to QByteArray. C++ does not have a text container 
class and that's not going to come until at least 2023 (C++2b).

std::string, like QByteArray, is encoding-agnostic but has some string-like 
convenience functions over a pure byte storage (like std::vector<byte>), like 
searching for a substring occurrence, instead single value_type elements. 
QByteArray does when we unified it with QCString in Qt 4.0.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products