[Development] Why can't QString use UTF-8 internally?

Thiago Macieira thiago.macieira at intel.com
Wed Feb 11 18:00:19 CET 2015


On Wednesday 11 February 2015 11:49:49 Matthew Woehlke wrote:
> I'm not going to claim this is the *best* answer, but at least one that
> seems logical... length() should be the number of times one must hit
> backspace starting from the end of the text to erase the entire text.

That will depend on the editor. Some may remove the full character with all 
the combining characters, some others may not.

> IOW, the number of logical glyphs. Double-width characters are one
> logical glyph. Combining characters are not independently logical glyphs
> (e.g. 'ñ' is one glyph, regardless of how it is encoded).

Exactly. We don't have a function for that, though.

> Conversely, I'm sure there are times when you need to know the number of
> codepoints (e.g. allocating memory to make a copy). Possibly length()
> and size() should return different results. (Which is a mess, but...)

Uh... no, that's not a good idea.

If we were going do to something like that, we'd have to find a less confusing 
name. Something like width().

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Development mailing list