[Development] Unicode/i18n support
Simon Hausmann
simon.hausmann at nokia.com
Fri Nov 25 10:54:59 CET 2011
On Friday, November 25, 2011 08:30:56 AM ext lars.knoll at nokia.com wrote:
> Hi,
>
> I have been thinking a bit on how to move forward with Unicode support in
> Qt lately. The current state is in my opinion not sustainable.
>
> Unicode and i18n support consists of quite a few different tasks. Roughly
> speaking, we currently have a handful of places where Unicode data and
> support handling is being done.
>
> Let me try to list them here:
>
> * (g)libc
> - Support codec conversion through iconv, Qt uses this for the native
> codec
> - collation, used by QString::localeAwareCompare()
> - local time <> UTC conversion
>
> Collation in glibc is completely unsuitable for us, as it only works for
> the current locale,
> and is utf8 based.
>
> * Windows API
> - pretty much the same use cases as glibc
>
> * Qt itself
> - data tables for most important codecs
> - basic Unicode properties (still on an old Unicode version)
> - data for QLocale
> - name prep, etc. data for QUrl
>
> * PCRE
> - as discussed in the mail thread
>
> * ICU
> contains everything we need and more. Uses utf16 as the internal encoding.
> The more contains things such as:
> * calendaring systems
> * Full (and fast) collation support
> * Timezone handling
> * Unicode 6.0
> * Full case folding support (including localized folding)
> * Localized data for cities, calendars and other stuff
> * Probably quite a few other things I forgot
>
> My proposal would be to simplify this setup and start relying on ICU for
> many of the tasks. We would still expose things through a Qt API though.
> It would simplify the maintenance of our Unicode support, as we can rely
> on ICU for most things.
>
> ICU has the advantage that it works on every system we support. Except for
> Windows it's preinstalled on most systems, so there wouldn't be an
> additional overhead on these platforms.
>
> The ICU data file is rather big (as it's very complete), but can be
> customized heavily. If you strip it down to support only what we currently
> support in Qt, the data file should not be significantly bigger than what
> we have right now.
>
> At the same time I'd propose to (over time) get rid of relying on glibc,
> windows and Mac specific APIs as much as possible. We could also remove
> most Unicode related data tables in Qt and only keep the ones that are
> performance relevant (text layouting relies on certain Unicode tables, and
> it might be faster if we have inline access to these tables).
>
> The things ICU supports that Qt doesn't currently offer could be exposed
> through wrapper APIs over time. That task should be a lot simpler than it
> would be today.
>
> Opinions?
I think it's a good idea and with my WebKit hat on I'm all for it. It'll
simplify our code paths significantly and reduce maintenance overhead.
Simon
More information about the Development
mailing list