[Development] Unicode/i18n support

Simon Hausmann simon.hausmann at nokia.com
Fri Nov 25 10:54:59 CET 2011


On Friday, November 25, 2011 08:30:56 AM ext lars.knoll at nokia.com wrote:
> Hi,
> 
> I have been thinking a bit on how to move forward with Unicode support in
> Qt lately. The current state is in my opinion not sustainable.
> 
> Unicode and i18n support consists of quite a few different tasks. Roughly
> speaking, we currently have a handful of places where Unicode data and
> support handling is being done.
> 
> Let me try to list them here:
> 
> * (g)libc
> 	- Support codec conversion through iconv, Qt uses this for the native
> codec
> 	- collation, used by QString::localeAwareCompare()
> 	- local time <> UTC conversion
> 
>   Collation in glibc is completely unsuitable for us, as it only works for
> the current locale,
>   and is utf8 based.
> 
> * Windows API
> 	- pretty much the same use cases as glibc
> 
> * Qt itself
> 	- data tables for most important codecs
> 	- basic Unicode properties (still on an old Unicode version)
> 	- data for QLocale
> 	- name prep, etc. data for QUrl
> 
> * PCRE
> 	- as discussed in the mail thread
> 
> * ICU
> 	contains everything we need and more. Uses utf16 as the internal encoding.
> 	The more contains things such as:
> 		* calendaring systems
> 		* Full (and fast) collation support
> 		* Timezone handling
> 		* Unicode 6.0
> 		* Full case folding support (including localized folding)
> 		* Localized data for cities, calendars and other stuff
> 		* Probably quite a few other things I forgot
> 
> My proposal would be to simplify this setup and start relying on ICU for
> many of the tasks. We would still expose things through a Qt API though.
> It would simplify the maintenance of our Unicode support, as we can rely
> on ICU for most things.
> 
> ICU has the advantage that it works on every system we support. Except for
> Windows it's preinstalled on most systems, so there wouldn't be an
> additional overhead on these platforms.
> 
> The ICU data file is rather big (as it's very complete), but can be
> customized heavily. If you strip it down to support only what we currently
> support in Qt, the data file should not be significantly bigger than what
> we have right now.
> 
> At the same time I'd propose to (over time) get rid of relying on glibc,
> windows and Mac specific APIs as much as possible. We could also remove
> most Unicode related data tables in Qt and only keep the ones that are
> performance relevant (text layouting relies on certain Unicode tables, and
> it might be faster if we have inline access to these tables).
> 
> The things ICU supports that Qt doesn't currently offer could be exposed
> through wrapper APIs over time. That task should be a lot simpler than it
> would be today.
> 
> Opinions?

I think it's a good idea and with my WebKit hat on I'm all for it. It'll 
simplify our code paths significantly and reduce maintenance overhead.


Simon



More information about the Development mailing list