[Development] Are char literals L1 or U8 in Qt?

Wed Jun 12 10:30:30 CEST 2024

Nope just TWO code points e.g. U+1F1FA: REGIONAL INDICATOR SYMBOL LETTER U) followed by 🇸 (U+1F1F8: REGIONAL INDICATOR SYMBOL LETTER S) for the US flag, 

-----Original Message-----
From: Development <development-bounces at qt-project.org> On Behalf Of Giuseppe D'Angelo via Development
Sent: 11 June 2024 20:09
To: development at qt-project.org
Subject: Re: [Development] Are char literals L1 or U8 in Qt?

Il 11/06/24 11:36, David C. Partridge ha scritto:
> Anyone iterating bytewise over a char[] in UTF-8 has also got serious 
> bugs given that a UTF-8 "graphic character" can be up to 8 bytes 
> (national flags comprise two UTF-8 code points).

There's no such thing as a UTF-8 "graphic character". Grapheme sequences are treated at a higher level anyhow in Qt, and we have APIs for that (QTextBoundaryFinder, etc.).

And it's not 2. 🏴󠁧󠁢󠁷󠁬󠁳󠁿 is 7 code points.

My 2 c,
--
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer KDAB (France) S.A.S., a KDAB Group company Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com KDAB - Trusted Software Excellence