[Development] Qt6: Adding UTF-8 storage support to QString

Edward Welbourne edward.welbourne at qt.io
Fri Jan 25 13:54:22 CET 2019


Arnaud Clère (25 January 2019 10:59) wrote:
> Most user code I have written or seen handles text data naively and is
> incorrect in some respect but I think only a minority of if is leading
> to real problems because input data will rarely trigger them.

That depends a lot on who's supplying your data.  The same rationale was
given for "making do" with old 8-bit encodings, which meant programs
worked for various rich nations' primary languages and didn't for anyone
else's.  Then we switched to UTF-16, which let us continue not thinking
about what we're really doing, while reaching a larger slice of the
world.  Still, that leaves us complicit in suppressing various minority
cultures by making software that works for the dominant culture around
them, but not for them.

Until we get into the habit of thinking of text properly (and I still
don't even know the terminology, so I have a way to go on this, just
like anyone) instead of as a sequence of evenly-sized units, we're going
to continue either being inefficient (because we use units that are
bigger than needed for many use-cases - arguably true of UTF-16) or we
fail to properly support cultures whose scripts are relegated to the
outer planes of Unicode - as, for example, the Chakma language's number
system, which QLocale currently can't represent (QTBUG-69324) because
the digits don't fit in a single UTF-16 unit (as QLocaleData expects of
digits, signs and quotes, though it understands most of its other
locale-specific texts might be longer).  As a result, we can't support
any Chakma locale.

By all means, let's make sure the internals are efficient for the more
common languages and scripts; but it's way past time to start doing
Unicode properly, so that all cultures are well-served by default, when
the software folk are using is built on Qt,

	Eddy.



More information about the Development mailing list