[Development] Oslo, we have a problem</apollo 13> [char8_t]

Arnaud Clere arnaud.clere at minmaxmedical.com
Tue Jul 9 09:21:20 CEST 2019


> -----Original Message-----
> From: Thiago Macieira <thiago.macieira at intel.com> 
> > On Monday, 8 July 2019 12:42:51 -03 Arnaud Clere wrote:
> > > -----Original Message-----
> > > From: Thiago Macieira <thiago.macieira at intel.com>
> > > 
> > > I am not completely convinced of the benefit of adding of an owning UTF-8 string class, though I very much agree with a view over UTF-8 strings.
> > > The reason is not the string class itself (alone it is definitely useful), but the fact that it would muddy the waters as to what 
> > > string classes one should use in API. We might end up with some API using UTF-8 and some UTF-16.
> > 
> > Indeed, this is already the case : QJsonDocument::toJson() returns a QByteArray
>
> Which is the expected behaviour, as it returns something suitable for transfer over a socket, pipe to a process or to be saved in a file, ...
>
> > on which users can conveniently call toUpper() until some data from the field makes them understand it does not work...
>
> And there's little we can do to prevent that. Even if we removed QByteArray::toUpper and left it only in QLatin1String, people would still find ways to uppercase. 

We could have a specific type (or trait ?!) for "QByteArray"s containing utf8 data that would enable the compiler to pinpoint some of these bugs, whereas presently, they can only be detected with appropriate input data.
If this *utf8* type can also be manipulated as a raw QByteArray, it does not change anything for code that just transfers the bytes from one place to the other.
I am pretty sure letting know to the compiler that the bytes returned by QJsonDocument::toJson are actually utf8 would help fix latent bugs in a lot of code.

> That's the reason I would prefer to keep it, with well- defined and locale-independent semantics.

And I am with you regarding your suggestion (elsewhere in the thread) that QByteArray functions operating on specific charsets like toUpper should be to restricted to the ASCII subset common to latin1/utf8.

> > It may be argued too that COW is not interesting for such strings and APIs can be fixed by using 
> > u8string, but then, you ask Qt users to master both QString and std::string like APIs...
>
> We don't ask users to use std::string APIs. That is not a text class, std::string is analogous to QByteArray. C++ does not have a text container class and that's not going to come until at least 2023 (C++2b).

I know. I was thinking aloud about using the future std::u8string... which presumably exhibits a std::string-like API to which Qt users would not be accustomed... I am not advocating for using it.



More information about the Development mailing list