[Development] Qt6: Adding UTF-8 storage support to QString
André Pönitz
apoenitz at t-online.de
Wed Jan 23 23:15:56 CET 2019
On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:
> 23.01.2019, 16:55, "Edward Welbourne" <edward.welbourne at qt.io>:
> > All of this discussion ignores a major elephant: QString's indexing is
> > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
> > for a couple of decades now.
> >
> > We *should* have a string type (I don't care what you call it) that acts
> > on strings indexed by Unicode characters, not in terms of a
> > representation. Whether that string type internally uses UTF-16 or
> > UTF-8 should be invisible to its user. Ideally it would be capable of
> > carrying its data internally in either form (so as to avoid needless
> > conversion when both producer and consumer use the same form) and of
> > converting between the two (e.g. so as to append efficiently) as needed.
>
> I think this is excessive. Most common operations with strings in application
> code are:
>
> * Pass the string around or compare as an opaque token
> * Draw the string on screen e.g. with QPainter (while technically it
> falls in the previous category, I think it's important enough to
> deserve separate item)
> * Find substring or pattern (regex) inside the string
> * Split the string by character, pattern, or index boundaries found by means
> of previous item
>
> I think the only common cases when dealing with Unicode grapheme clusters
> is required are
>
> * Handling of text cursor movement
> * Implementation of text shaping, i.e. what Harfbuzz is doing
>
> I think having special iterator would be quite enough for cursor case. Such
> iterator could abstract away underlying encoding, instead of forcing everyone
> to convert to UTF-16 first.
All of that is scarily close to my opinion on the topic.
Andre'
More information about the Development
mailing list