[Development] Oslo, we have a problem</apollo 13> [char8_t]

Mon Aug 5 14:32:06 CEST 2019


> On 8 Jul 2019, at 15:34, Thiago Macieira <thiago.macieira at intel.com> wrote:
> 
> On Monday, 8 July 2019 04:24:28 -03 Mutz, Marc via Development wrote:
>> What I think when I read this is:
>> 
>> Backed by const char*, never implicit:
>> - QLatin1String - owner of L1 data [change from today, but not a
>> breaking one]
>> - QLatin1StringView - what QLatin1String is now [requires porting, but
>> it's just s/QLatin1String/QLatin1StringView/g in client code]
>> 
>> Backed by const char8_t*, implicit:
>> - QUtf8String - owner of UTF-8 data
>> - QUtf8StringView - view over UTF-8 data
>> 
>> Backed by const char16_t*, implicit (from char16_t*, Q*StringView, NOT
>> from QByteArray)
>> - QString - owner of UTF-16 data  [as before, possibly using char16_t
>> internally to avoid the tons of ushort casts]
>> - QStringView - view over UTF-16 data
>> 
>> Backed by const std::byte*, implicit:
>> - QByteArray - owner of std::byte data, no string-like functions
>> [breaking change, but anyway far in the future, as we can't depend on
>> std::byte, yet]
>> - QByteArrayView - view over std::byte (uchar, char, ...) data.
>> 
>> QByteArray, QUtf8String and QLatin1String(new) could use the same
>> backend, for zero-copy transformations between them.
>> 
>> Is this a realistic goal for Qt 7? Last time I proposed
>> QUtf8String/View, it's usefulness was challenged. I think the advent of
>> char8_t in C++20 and std::byte in C++17 change the game quite a bit,
>> though.
> 
> In a green field scenario, yes, that would be a realistic goal. 

In a green field scenario, we probably would have a utf-8 backed QString, and not use utf-16. But we are where we are...
> 
> I am not completely convinced of the benefit of adding of an owning UTF-8 
> string class, though I very much agree with a view over UTF-8 strings. The 
> reason is not the string class itself (alone it is definitely useful), but the 
> fact that it would muddy the waters as to what string classes one should use 
> in API. We might end up with some API using UTF-8 and some UTF-16.

I am not convinced neither, as it would very much complicate our API. If we want to do something we should maybe consider one class that can operate on both encodings behind the scenes. If we do this, do it for both QString and QStringView and with that get a relatively simple and consistent API. The price we’d pay is in implementation complexity and that certain operations can’t be inlined anymore.
> 
> But the biggest challenge is converting *every* *single* use of QLatin1String 
> to QLatin1StringView. We can introduce it as a direct alias right now, at some 
> point in late Qt 6 deprecate QLatin1String, at a point where people wouldn't 
> be trying to keep compatibility with Qt < 5.15, then reintroduce it in Qt 7.0.

I agree with Thiago. I’m ok to add a QL1StringView class, but we’ll need to keep QL1String for Qt 6.
> 
> I'm not sure we should go through all that trouble for three functions. People 
> don't want Latin1 case-insensitiveness, they want US-ASCII. It just so happens 
> that it was easy for us to implement Latin1 in those functions that we did so.
> 
> I propose we make a documented change in behaviour in 6.0 and remove the upper 
> half of the case tables of qbytearray.cpp:latin1_uppercased and 
> latin1_lowercased. That would make those functions operate fully on US-ASCII 
> only, which would in turn make them safe[*] for UTF-8 content too.

+1 for that change.

Cheers,
Lars

> 
> [*] where "safe" is defined as ASCII-insensitive and non-ASCII sensitive. 
> There are some broken protocols like that, like DNS-SD (used in Zeroconf), 
> which uses UTF-8 encoding over US-ASCII case-insensitive DNS.
> 
> -- 
> Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel System Software Products
> 
> 
> 
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> https://lists.qt-project.org/listinfo/development