[Development] char8_t summary?
Mutz, Marc
marc at kdab.com
Wed Jul 10 15:10:07 CEST 2019
On 2019-07-10 14:55, André Pönitz wrote:
> On Wed, Jul 10, 2019 at 11:29:15AM +0200, Mutz, Marc via Development
> wrote:
>> On 2019-07-10 10:50, Arnaud Clere wrote:
>> > Hi all,
>> >
>> > So, do I understand correctly that:
>> > 1. QUtf8String may be required in Qt7 to solve problems due to C++2x
>> > char8_t
>>
>> I wouldn't say required. I also don't think it needs to wait until Qt
>> 7. Qt
>> 7 is where we may depend on C++20 and can use char8_t in the interface
>> and
>> implementation, but we should certainly not wait for that to add the
>> class.
>> It's certainly a good idea, IMO, to have views and owning containers
>> that
>> operate on L1, UTF-8 and UTF-16 strings. The views are more important.
>>
>> > 2. QByteArray methods currently operating on latin1 may be restricted
>> > to ascii in Qt6 to avoid problems when const char* input really is
>> > utf8
>>
>> I have no opinion on that.
>>
>> > 3. QLatin1String may become QLatin1StringView by Qt7
>>
>> Qt 6. We can add the name as an alias now, make QLatin1String an
>> owning
>> container for Qt 6.0 (it breaks no code, just makes it slower, and the
>> port
>> is trivial), and QLatin1StringView becomes what QLatin1String is now.
>
> As far as I understand there's a perceived need to have "full" utf8
> literals, and there's a need to have ASCII literals. First could be
> served by some QUtf8*, second by QAscii*, both additions, no need to
> change QLatin* semantics.
L1 is special because it's the first plane of Unicode, so conversion
between the two will always be faster than between other encodings. This
is why it makes sense to use all 8 bits and have L1, not artificially
restrict to US-ASCII strings. That's one reason: opportunism.
The other reason is about error checking: What should the result be of
putting an æ into a QAsciiString? Assert at runtime? UB? In
QLatin1String, this error just can't happen. Even if you feed it UTF-8,
you may get mojibake, because you picked the wrong encoding, but it's
not an error. Any UTF-8 octet sequence is a valid L1 string.
So, I don't see QAscii* pulling it's weight.
Thanks,
Marc
More information about the Development
mailing list