[Development] Why can't QString use UTF-8 internally?

Matthew Woehlke mw_triad at users.sourceforge.net
Wed Feb 11 17:49:49 CET 2015


On 2015-02-11 11:29, Thiago Macieira wrote:
> On Wednesday 11 February 2015 11:22:59 Julien Blanc wrote:
>> On 11/02/2015 10:32, Bo Thorsen wrote:
>>> 2) length() returns the number of chars I see on the screen, not a
>>> random implementation detail of the chosen encoding.
>>
>> How’s that supposed to work with combining characters, which are part of
>> unicode ?
> 
> That's true. And add that there are some zero-width characters too and some 
> characters that are double-width.

I'm not going to claim this is the *best* answer, but at least one that
seems logical... length() should be the number of times one must hit
backspace starting from the end of the text to erase the entire text.
IOW, the number of logical glyphs. Double-width characters are one
logical glyph. Combining characters are not independently logical glyphs
(e.g. 'ñ' is one glyph, regardless of how it is encoded).

Conversely, I'm sure there are times when you need to know the number of
codepoints (e.g. allocating memory to make a copy). Possibly length()
and size() should return different results. (Which is a mess, but...)

-- 
Matthew




More information about the Development mailing list