[Development] Oslo, we have a problem</apollo 13> [char8_t]
Thiago Macieira
thiago.macieira at intel.com
Sat Jul 6 18:59:54 CEST 2019
On Saturday, 6 July 2019 11:09:36 -03 Mutz, Marc via Development wrote:
> > Anyway, QByteArray has *Latin1* text-manipulation functions (toUpper
> > and
> > toLower), its split(char) function will happily split on indivdual
> > bytes of an
> > UTF-8 multibyte sequence, so adding char8_t overloads seems just wrong
> > to me.
>
> const char* in Qt is always assumed to be UTF-8-encoded. You need to use
> QLatin1String to have it interpreted as Latin-1:
>
> https://doc.qt.io/qt-5/qstring.html#QString-8
> https://doc.qt.io/qt-5/qstring.html#QString-7
That's QString, not QByteArray.
But QByteArray is encoding-indeterminate since it can carry any type.
Arguably, toUpper() and toLower() should be removed, since
QByteArray(u8"Résumé").toLower()
is mojibake.
In fact, QByteArray should use std::byte in functions like data(), but that's
unwieldy and breaks too much compatibility.
> > What did you try to use QByteArray with that showed problems?
>
> Just QByteArray(u8"Hello") already fails when compiled with -std=c++2a.
> And this is also why we need to fix it. The same compiles fine in C++17,
> and does the expected thing.
I think we need to talk to SG16.
We can add the template overloads to all functions so we can take char,
unsigned char, std::byte and char8_t without complaining. I am with you that
this could result in explosive compile times[1]. But it also does not solve
the problem of what type data() / constData() and the iteration functions
return.
I wouldn't mind a udata() function anyway, since there's a lot of code dealing
with "bytes" as unsigned char. Are we willing to add ubegin() and begin8()
too?
[1] Please, no one say "Modules!" here, it's not a full solution, even if we
can use them in Qt 6's lifetime.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development
mailing list