[Development] Are char literals L1 or U8 in Qt?

Edward Welbourne edward.welbourne at qt.io
Wed Jun 12 10:51:26 CEST 2024


Il 11/06/24 11:36, David C. Partridge ha scritto:
>>> Anyone iterating bytewise over a char[] in UTF-8 has also got
>>> serious bugs given that a UTF-8 "graphic character" can be up to 8
>>> bytes (national flags comprise two UTF-8 code points).

Giuseppe D'Angelo (11 June 2024 20:09) replied
>> There's no such thing as a UTF-8 "graphic character". Grapheme
>> sequences are treated at a higher level anyhow in Qt, and we have
>> APIs for that (QTextBoundaryFinder, etc.).
>>
>> And it's not 2. 🏴󠁧󠁢󠁷󠁬󠁳󠁿 is 7 code points.

David C. Partridge (12 June 2024 10:30) replied:
> Nope just TWO code points e.g. U+1F1FA: REGIONAL INDICATOR SYMBOL
> LETTER U) followed by 🇸 (U+1F1F8: REGIONAL INDICATOR SYMBOL LETTER S)
> for the US flag,

Some confusion here.
That's two Unicode code points, each of whch takes several bytes to
encode in UTF-8 (and up to two char16_t to encode in UTF-16, as QString
does).

I'll trust Peppe's count is thus of bytes in UTF-8.

	Eddy.


More information about the Development mailing list