[Development] Oslo, we have a problem</apollo 13> [char8_t]

Sat Jul 6 18:59:54 CEST 2019

On Saturday, 6 July 2019 11:09:36 -03 Mutz, Marc via Development wrote:
> > Anyway, QByteArray has *Latin1* text-manipulation functions (toUpper
> > and
> > toLower), its split(char) function will happily split on indivdual
> > bytes of an
> > UTF-8 multibyte sequence, so adding char8_t overloads seems just wrong
> > to me.
> 
> const char* in Qt is always assumed to be UTF-8-encoded. You need to use
> QLatin1String to have it interpreted as Latin-1:
> 
> https://doc.qt.io/qt-5/qstring.html#QString-8
> https://doc.qt.io/qt-5/qstring.html#QString-7

That's QString, not QByteArray.

But QByteArray is encoding-indeterminate since it can carry any type. 
Arguably, toUpper() and toLower() should be removed, since

	QByteArray(u8"Résumé").toLower()
is mojibake.

In fact, QByteArray should use std::byte in functions like data(), but that's 
unwieldy and breaks too much compatibility.

> > What did you try to use QByteArray with that showed problems?
> 
> Just QByteArray(u8"Hello") already fails when compiled with -std=c++2a.
> And this is also why we need to fix it. The same compiles fine in C++17,
> and does the expected thing.

I think we need to talk to SG16.

We can add the template overloads to all functions so we can take char, 
unsigned char, std::byte and char8_t without complaining. I am with you that 
this could result in explosive compile times[1]. But it also does not solve 
the problem of what type data() / constData() and the iteration functions 
return.

I wouldn't mind a udata() function anyway, since there's a lot of code dealing 
with "bytes" as unsigned char. Are we willing to add ubegin() and begin8() 
too?

[1] Please, no one say "Modules!" here, it's not a full solution, even if we 
can use them in Qt 6's lifetime. 

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products