[Development] Unicode/i18n support

lars.knoll at nokia.com lars.knoll at nokia.com
Fri Nov 25 09:30:56 CET 2011


Hi,

I have been thinking a bit on how to move forward with Unicode support in
Qt lately. The current state is in my opinion not sustainable.

Unicode and i18n support consists of quite a few different tasks. Roughly
speaking, we currently have a handful of places where Unicode data and
support handling is being done.

Let me try to list them here:

* (g)libc
	- Support codec conversion through iconv, Qt uses this for the native
codec
	- collation, used by QString::localeAwareCompare()
	- local time <> UTC conversion

  Collation in glibc is completely unsuitable for us, as it only works for
the current locale,
  and is utf8 based.

* Windows API
	- pretty much the same use cases as glibc

* Qt itself
	- data tables for most important codecs
	- basic Unicode properties (still on an old Unicode version)
	- data for QLocale
	- name prep, etc. data for QUrl

* PCRE
	- as discussed in the mail thread

* ICU
	contains everything we need and more. Uses utf16 as the internal encoding.
	The more contains things such as:
		* calendaring systems
		* Full (and fast) collation support
		* Timezone handling
		* Unicode 6.0
		* Full case folding support (including localized folding)
		* Localized data for cities, calendars and other stuff
		* Probably quite a few other things I forgot

My proposal would be to simplify this setup and start relying on ICU for
many of the tasks. We would still expose things through a Qt API though.
It would simplify the maintenance of our Unicode support, as we can rely
on ICU for most things.

ICU has the advantage that it works on every system we support. Except for
Windows it's preinstalled on most systems, so there wouldn't be an
additional overhead on these platforms.

The ICU data file is rather big (as it's very complete), but can be
customized heavily. If you strip it down to support only what we currently
support in Qt, the data file should not be significantly bigger than what
we have right now.

At the same time I'd propose to (over time) get rid of relying on glibc,
windows and Mac specific APIs as much as possible. We could also remove
most Unicode related data tables in Qt and only keep the ones that are
performance relevant (text layouting relies on certain Unicode tables, and
it might be faster if we have inline access to these tables).

The things ICU supports that Qt doesn't currently offer could be exposed
through wrapper APIs over time. That task should be a lot simpler than it
would be today.

Opinions?

Lars




More information about the Development mailing list