[Development] Are char literals L1 or U8 in Qt?

Tue Jun 11 21:08:45 CEST 2024

Il 11/06/24 07:12, Thiago Macieira ha scritto:
> I'm arguing that such code is likely already broken (producing mojibake) for
> non-US-ASCII content, so having U+FFFD instead of mojibake is not worse. You
> wouldn't be able to work around the issue by un-doing the improper encoding,
> which means it would force users to fix their code.

Is it? I somehow suspect that there's a lot of code out there that does 
stuff like:

   string.indexOf('\xfc')   // search for ü

or similar.

(Usual disclaimer: not every developer is aware of encodings. Maybe they 
tried 'ü', and got a mysterious warning from the compiler, and the code 
didn't work; so they just put '\xfc' instead, and now it works -- ok, 
let's carry on.)

I'm not claiming that the situation is ideal, as we're clearly being 
inconsistent: `char` is being treated as UTF-8 or Latin1 depending on 
the context.

Yet, breaking a ~20 year behavior in "low-level code" is ... scary? It 
should require extraordinary motivation and care; we're probably talking 
about making 6.8->6.14 warn if someone passes a non-ASCII char to 
QASV/QChar(char)'s constructor, and change behavior to accept ASCII-only 
in 6.15?

Thanks,
-- 
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - Trusted Software Excellence

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4244 bytes
Desc: Firma crittografica S/MIME
URL: <http://lists.qt-project.org/pipermail/development/attachments/20240611/e582d9b2/attachment.bin>