[Development] QtCS - ICU and Localization session

John Layt jlayt at kde.org
Fri Jul 26 20:33:40 CEST 2013


Hi,

A QtCS session was held to discuss the situation with ICU.  No
conclusions were made as time ran out, but research tasks were set to
narrow the options.  Meeting minutes and detailed notes on my research
can be found at
http://qt-project.org/groups/qt-contributors-summit-2013/wiki/Qt_ICU.
What follows is a rather long summary of my thoughts.  QDateTime and
QTimeZone were also unable to be discussed, a separate email will
follow about those.

tl;dr: ICU or equivalent native api is available on all platforms,
except for Win32, which could be abstracted in a new set of wrapper
classes providing a new advanced locale api.  A minimum legacy level
of support common to all platforms would still be publicly provided by
QLocale, including the new calendar and time zone features, using the
ICU abstraction layer where available, or a new Win32 backend, to
replace the existing locale database and code.  The old database could
still be available for embedded if required.  A build script could be
provided to simplify building a minimal required ICU for Win32 apps.
I'm working on proof-of-concept code for this and will post code in
2-3 weeks.

The original idea was to use ICU on all platforms to have a single
locale backend and to remove all the Qt locale data and code, system
locale code, and code conversion tables, saving 1-2MB from QtCore and
a lot of code maintenance.  The primary motivation was to enable
advanced features such as calendar systems, time zones and collation
without coding them ourselves or shipping the extra 2-6 MB of
translations and data required.  The real world makes this very hard
to do.

Problems with ICU itself
* ICU does not read the host system data or the user custom settings
so a system locale backend is still required
* No BC guarantee on C++ api so restricted to using C api which lacks
some required features

ICU Platform Support
* ICU supports all platforms, but not all platforms support ICU at a
system-install level, requiring some devs to ship their own copy.
* Linux and QNX always have a system library that can always be used.
* OSX and iOS have a system library but no headers and linking is
banned from the app store.  The native api is a thin wrapper around
ICU and functionally equvalent.
* Windows does not have ICU and the native Win32 api cannot match ICU
functionality, but can provide full current QLocale functionality
including custom locales and some new functionality such as calendar
systems and time zones.
* WinRT does not have ICU but the native api is functionally
equivalent to ICU as it uses the same data source
* Android has ICU4J but not ICU4C by default, although it is
'supported'.  The ICU4J data file is a different format and not easily
shared with ICU4C.  Only native api is POSIX or JNI.  A system ICU4C
may be added in next version.
* Tizen has a system library but no headers and linking is banned from
the app store.  Native api is not yet known.  Not considered for now.

ICU Data Size
* Current binary download from ICU is 11 MB compressed / 27 MB on disk
which is too much for many Windows/iOS/Android devs to distribute
* ICU can be shrunk by adding local make files, but knowing which ones
to add and how is poorly documented
* Data is 80% of ICU, of which 27% is code conversion tables, 27%
translations, 10% Collation Rules, 5% Locale formats, and 31% required
data
* The most common code conversions including UNICODE are done
algorithmically, or with small tables, most large tables are EBCDIC or
East-Asian, an expert needs to review what we really need
* ICU translations support +/-300 language variants which most apps
will not need, much of this data is unnecessary and can be disabled
while still shipping the full set of locale formats
* Build switches reduce the library size if features are not required,
but the savings are not significant
* A typical custom build for all features but only a dozen supported
European languages might be 6 MB compresssed / 12 MB on disk

QtWebKit
* Hard dependency on ICU for localization, code conversion tables,
text boundary analysis, etc
* WebKit has an abstraction api for host system localization but
no-one uses it, only ICU backend implemented
* Only way to remove ICU dependency is to write our own backend which
is a lot of work.  A primary motivation to move away from WebKit is to
avoid maintaining our own backends
* Chromium always builds and ships their own copy of ICU on all platforms

In summary: Linux, QNX, Mac, iOS, WinRT and Android (assuming JNI is
performant enough) all provide *system* level ICU or native
ICU-equivalent implementations that can be abstracted in a new locale
api, only Win32 does not.  Even then, any Win32 app needing QtWebKit
will also have ICU availble.  It seems perverse to restrict all other
platforms from accessing what their native api provides simply because
Win32 has some limitations.  At the very least, we could be using this
abstraction layer as the QLocale backend for system and custom locales
where available, saving shipping the current database ourselves on
most platforms.

After looking more at the Win32 api, I think I can use it to provide
not just the system locale but also the custom locales for all
existing QLocale features and for the required new features such as
calendar systems and time zones.  Collation I still need to
investigate.  This would allow us to replace the existing Qt backend
on Windows as well.  We could effectively have a compile-time choice
of 4 backends for QLocale: ICU-equivalent, Win32, old Qt database, and
C.

I also think it should be straightforward to add a buildscript to
3rdparty that simplifies building a minimal required ICU for those
Win32 apps that want it, i.e. read a config file for language
translations required, read the Qt config flags for features required,
download the source, create the required make files, then configure
and build.

This solution makes QLocale the "legacy" locale api which we guarantee
to apps is available and fully-functional on all platforms, which
should be all most apps will need, and provide a new advanced api for
those apps that need it.  We would then have the choice of either only
exporting the new classes on ICU-equivalent platorms, or always
exporting it but with reduced functionality on Win32.  This really is
the main contentious point and I'd like people's opinions on it:
* Having platform-dependent api is not very desirable, but here it is
mitigated by being entire classes rather than methods, and only on
Win32 where shared Qt installs are uncommon.
* Having a partly functional api could lead to bad user experiences,
but there is a precedent for this with QCollator which needs ICU to
actually do anything, on Mac/Win32 it does nothing.  It's safer.
* We could wait and see exactly what Win32 cannot provide before deciding

I know there was some opposition to having new locale classes, but
there's a number of good reasons to not add large chunks of new api to
QLocale:
* QLocale already has a fairly big api which has grown organically so
is somewhat inconsistent with itself, adding a lot more api that works
differently will just add to the confusion
* The monolithic design is at odds with the native api for Mac, ICU,
WinRT, .Net, Java and others where separate classes for number
formatters, date formatters, collators and calendars are the norm and
so familiar to devs.
* Mapping the separate ICU and Mac classes to QLocale will be messy
enough, it is a lot easer and cleaner to have wrapper classes
abstracting and managing each formatter, and we might as well make
those classes public
* It makes it clear that the new locale stuff may work differently to
the old api, e.g. date format codes, default settings, etc

I've started work on refactoring QLocale and my existing ICU api code
to try make this work and should have a working proof-of-concept in a
couple of weeks.  At the very least, even without making the new
classes public this should result in a cleaner QLocale, an ICU backend
for use on Linux and QNX, improved Mac and Win32 support and smaller
library size on most platforms.

Thoughts?

John.



More information about the Development mailing list