[Development] Are char literals L1 or U8 in Qt?

Volker Hilsheimer volker.hilsheimer at qt.io
Tue Jun 11 22:03:11 CEST 2024


> On 11 Jun 2024, at 21:08, Giuseppe D'Angelo via Development <development at qt-project.org> wrote:
> 
> Il 11/06/24 07:12, Thiago Macieira ha scritto:
>> I'm arguing that such code is likely already broken (producing mojibake) for
>> non-US-ASCII content, so having U+FFFD instead of mojibake is not worse. You
>> wouldn't be able to work around the issue by un-doing the improper encoding,
>> which means it would force users to fix their code.
> 
> Is it? I somehow suspect that there's a lot of code out there that does stuff like:
> 
>  string.indexOf('\xfc')   // search for ü
> 
> or similar.
> 
> (Usual disclaimer: not every developer is aware of encodings. Maybe they tried 'ü', and got a mysterious warning from the compiler, and the code didn't work; so they just put '\xfc' instead, and now it works -- ok, let's carry on.)
> 
> I'm not claiming that the situation is ideal, as we're clearly being inconsistent: `char` is being treated as UTF-8 or Latin1 depending on the context.
> 
> Yet, breaking a ~20 year behavior in "low-level code" is ... scary? It should require extraordinary motivation and care; we're probably talking about making 6.8->6.14 warn if someone passes a non-ASCII char to QASV/QChar(char)'s constructor, and change behavior to accept ASCII-only in 6.15?


I do agree that it makes more sense to assume that code that feeds a single `char` into a Qt API wants that character to be interpreted as Latin1. For one, because it has been like that forever, and it is still so in case of e.g. QChar(char). Also, if the character value is outside the US-ASCII range, then the only alternative would be to interpret it as an incomplete UTF-8 sequence, which can’t be the right answer. QString::arg(char) (or operator+(char), for that matter) is not usable as a tool to assembly a valid string from individual UTF-8 code points, after all.

Volker



More information about the Development mailing list