[Development] Oslo, we have a problem</apollo 13> [char8_t]

Mon Jul 8 09:46:28 CEST 2019

Hi all,

> -----Original Message-----
> From: Thiago Macieira <thiago.macieira at intel.com> 
>
> But QByteArray is encoding-indeterminate since it can carry any type. 
> Arguably, toUpper() and toLower() should be removed, since QByteArray(u8"Résumé").toLower() is mojibake.
...
> Are we willing to add ubegin() and begin8() too?

Instead of asking users to choose correct QByteArray methods depending on the data it contains, why not proposing them to explicitly say what it contains?

//! Explicitely utf8 encoded byte array
class QUtf8String : public QByteArray
{
public:
    using QByteArray::QByteArray;
    QUtf8String(const QByteArray& o) : QByteArray(o) {}
    QUtf8String() : QByteArray() {}
};

Such QUtf8String can be used everywhere a QByteArray is.
Qt implementors can fix QByteArray toUpper(), split(), etc. without having to guess what to do.
Users only have to specify where they use utf8 to make sure they use the correct functions.
Having a COW QUtf8String providing a migration path from ambiguous QByteArray seems in line with the addition of char8_t and u8* to C++ standard.