[Development] Unicode/i18n support

Wed Nov 30 01:28:28 CET 2011

On Tuesday, 29 de November de 2011 21.41.58, John Layt wrote:
> I'm generally in favour, even if it means throwing away most of my work
> from the last few months :-)  In QLocale it will definitely save us a lot
> of code and maintenance , give advanced features at no extra cost, and
> solves the locale data size problem for embedded platforms.  However
> there's probably a few implications to work through before fully committing
> to it.

Hi John

That was a long email, so forgive me if I miss something, but do call out if 
it goes unanswered.

> I'm assuming we would use ICU for all parsing and formatting of numbers,
> currency, dates, times, etc?  And that we would continue to use the host
> system settings, i.e. where the user has set something other than the
> locale default?  ICU does provide api to define what settings to use, not
> just what locale is set, so this is covered.

I think we need to clean up the API such that base QtCore classes operate on a 
very well-defined and predictable locale: the C locale. For example, 
QString::number(1.1) produces always "1.1" and QString("1.1").toDouble() 
always produces 1.1, regardless of locale. The same applies to date, time 
parsing, calendars, etc.

Then we have QLocale that provides access to locale-specific parsing and 
stringifying, with proper month names, numeric formatting, etc., with a 
convenient and easy way to access the locale-specific routines. That means we'd 
implement code by selecting the ICU locale by ourselves, I think.

And on top of that, the user-visible classes and QML components operate on 
user-settings.

> Would we still keep the old Qt4 routines or discard them entirely? The
> existing number parsers/formatters are quite deeply embedded in various
> classes in particular for fast C locale parsing.  Removing them may have
> wider implications that need checking, for example for the QValidator
> classes I don't know if we can still have Intermediate states without our
> own parsers?

Unknown.

> There's highly likely to be subtle and not-so-subtle behavioural changes,
> e.g. in how certain formats are interpreted, strictness of parsing, etc.
> For example scientific notation in CLDR is usually 'E' but Qt4 always uses
> 'e'.  The date format codes in particular are different, and while my
> changes for Qt5 were switching to using the CLDR codes, I did include a
> compatibility mode to use the Qt4 codes which we couldn't do if we switch
> fully to ICU.

I think that "e" or "E" makes little difference... For example, for QUrl I 
changed the case used in percent-encoding -- it's now uppercase. If your code 
can't deal with that, it's broken anyway.-

> Also, if we're no longer using our own routines, but for example just
> reading the Windows format settings and passing those to the ICU routines,
> then wouldn't it just be better/quicker to call the Windows routines
> directly and save the read of the Windows settings?  We'd only use ICU if
> Windows didn't provide a feature.  While subtle differences might then
> appear between platforms, they would be consistent with all other apps
> running on those platforms?

Unknown.

> If we're breaking behaviour, will there also be room for more source
> incompatible changes to align QLocale more closely to CLDR/ICU, be more
> consistent with itself, or be more useful to KDE (see my earlier email
> about QSystemLocale and other stuff [1]).  We already have to break source
> compatibility slightly for the date/time api, and perhaps different api
> will make the behaviour changes more obvious?

We have had a quick discussion here in San Francisco about that. I think that 
global Unix settings need to be fixed in Unix, not in the desktop environment 
or the toolkits. We need to come up with proposals for this on the XDG 
settings hierarchy and bring it up during the Linux Plumbers Conference next 
year.

KDE keeping its own locale configuration separate from the system is just 
broken, just as is GNOME by keeping it in GConf, or Harmattan doing the same. 
Locale settings need to be accessible by non-GUI applications, so they need to 
be in a regular file.

> For Time Zones, while we can initially use ICU as a data source and
> backend, I think we will still need to read the host system Time Zones for
> compatibility purposes and as the ICU tz file may be older than the system
> tz file.  ICU will be a good source for consistent translations of the zone
> names.

Agreed. I'd even skip completely the ICU timezone data and go straight to the 
system tzdata (if that's available -- which it isn't on Windows). In fact, I 
don't know why ICU has this data at all, it shouldn't. The Olson database is 
updated several times a year, so we can definitely NOT use the data that was 
available at the time Qt was released.

> I've already been doing lots of work on QLocale so would be happy to work
> on this if needed, especially as I already have the date/time api sorted,
> and a lot of fixes to the Windows/OSX system locales.  I'll also rework my
> existing QDateTime changes to be done in two stages, internal QDate
> improvements and later QLocale/QCalendarSystem dependent changes.  Then I
> need to figure out how to get the KDE locale to work in this scheme.

I'd like to see your calendaring work finished soon. I think we can say we have 
two months at most before feature freeze, so that should go in soon...

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20111129/03f5fb40/attachment.sig>