[Development] char8_t summary?

Wed Jul 10 15:10:07 CEST 2019

On 2019-07-10 14:55, André Pönitz wrote:
> On Wed, Jul 10, 2019 at 11:29:15AM +0200, Mutz, Marc via Development 
> wrote:
>> On 2019-07-10 10:50, Arnaud Clere wrote:
>> > Hi all,
>> >
>> > So, do I understand correctly that:
>> > 1. QUtf8String may be required in Qt7 to solve problems due to C++2x
>> > char8_t
>> 
>> I wouldn't say required. I also don't think it needs to wait until Qt 
>> 7. Qt
>> 7 is where we may depend on C++20 and can use char8_t in the interface 
>> and
>> implementation, but we should certainly not wait for that to add the 
>> class.
>> It's certainly a good idea, IMO, to have views and owning containers 
>> that
>> operate on L1, UTF-8 and UTF-16 strings. The views are more important.
>> 
>> > 2. QByteArray methods currently operating on latin1 may be restricted
>> > to ascii in Qt6 to avoid problems when const char* input really is
>> > utf8
>> 
>> I have no opinion on that.
>> 
>> > 3. QLatin1String may become QLatin1StringView by Qt7
>> 
>> Qt 6. We can add the name as an alias now, make QLatin1String an 
>> owning
>> container for Qt 6.0 (it breaks no code, just makes it slower, and the 
>> port
>> is trivial), and QLatin1StringView becomes what QLatin1String is now.
> 
> As far as I understand there's a perceived need to have "full" utf8
> literals, and there's a need to have ASCII literals. First could be
> served by some QUtf8*, second by QAscii*, both additions, no need to
> change QLatin* semantics.

L1 is special because it's the first plane of Unicode, so conversion 
between the two will always be faster than between other encodings. This 
is why it makes sense to use all 8 bits and have L1, not artificially 
restrict to US-ASCII strings. That's one reason: opportunism.

The other reason is about error checking: What should the result be of 
putting an æ into a QAsciiString? Assert at runtime? UB? In 
QLatin1String, this error just can't happen. Even if you feed it UTF-8, 
you may get mojibake, because you picked the wrong encoding, but it's 
not an error. Any UTF-8 octet sequence is a valid L1 string.

So, I don't see QAscii* pulling it's weight.

Thanks,
Marc