[Development] Are char literals L1 or U8 in Qt?

Tue Jun 11 11:36:47 CEST 2024

Anyone iterating bytewise over a char[] in UTF-8 has also got serious bugs given that a UTF-8 "graphic character" can be up to 8 bytes (national flags comprise two UTF-8 code points).

David

-----Original Message-----
From: Development <development-bounces at qt-project.org> On Behalf Of Thiago Macieira
Sent: 10 June 2024 22:14
To: development at qt-project.org
Subject: Re: [Development] Are char literals L1 or U8 in Qt?

On Monday 10 June 2024 05:39:26 GMT-7 Marc Mutz via Development wrote:
> Since there are four bugs³ in QString::arg() that are all fixed by the 
> existing patch chain porting the whole thing to QAnyStringView, and 
> since the medium-term goal is to deprecate use of char for characters 
> and char[] for strings (QT_ASCII_WARN), anyway, I would like to fix
> QASV(char) to mean QASV(QChar(char)), not redefine char literals as
> UTF-8 and break many more users (QASV is relatively new; QChar(char) 
> and
> QString::arg(char) are there since before Qt 4).

I am all for fixing the incompatibility, but I am of the opinion that char-as-
Latin1 was the wrong choice. It was wrong in Qt 4 and is still wrong now. 
My point is that a const char[] is a UTF-8 string, therefore each char in there is an UTF-8 code unit.

Anyone iterating character by character across two different encodings probably already has bugs.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Principal Engineer - Intel DCAI Fleet Systems Engineering