[Interest] Direct-lookup translation, even for English (was Re: get english translation)

Tue Sep 2 09:01:30 CEST 2014

On 1 Sep 2014, at 8:32 PM, Thiago Macieira wrote:

> On Monday 01 September 2014 12:50:23 Graham Labdon wrote:
>> Hi
>> My application is internationalized, however, in some circumstances I need
>> the English version of the string no matter what translator is being used.
>> Anyone have any suggestions on how to achieve this?
> 
> You have the English text. The way to get the English text from that is to use 
> it as-is.
> 
> Don't translate, don't transform it in any way.

Some projects like to use "programmer shorthand" for strings and then leave the final text up to a different team.  Such teams tend to be capricious and change the English text multiple times (PR and marketing reasons), so it's an advantage not to treat the English text as the key for looking up the translation.  (And then the shorthand can be written in any language the programmer likes, too.)

I had a previous job like that.  They had their own cross-platform cross-toolkit translation system, so I wasn't allowed to use tr() at all, and they convinced me that their way was better.  They wanted to have all the strings for all the languages inside the binary.  And there was a utility to generate an enum for the shorthand strings.  Imagine opening up Designer and creating a button, and setting its label to BUTTON_OK.  Then you run uic.  Then you post-process the generated header file and make it do something like tr(BUTTON_OK), which will use the enum to look up the actual string at runtime, in a fixed-length array in which all strings for a particular language are stored.

- Since the header modification is part of the build process, it costs the developer no time at all if the translation team wants to change the English text.  They do their work independently, and then the application is re-built. 

- At runtime, this kind of lookup is much more efficient than the usual Gnome and KDE ways of looking around the filesystem for separate translation files (but at the cost of bloating the binary somewhat).  

- It is clear to everyone when the translation mechanism is not working or there is an un-translated string: you see the enum shorthand strings instead of the correct strings - no need to wait for a non-English-speaking tester to discover it.

- There will never be a user who cannot use the app because it wasn't installed correctly and therefore the translation file was not found: the strings are guaranteed to be present as long as someone did the translation work before it shipped.

The current way of using separate translation files should be kept as a fallback mechanism though, so that people in countries which were neglected by the development team can still do their own translations.  This is the good part of the KDE way, that the translation work can be distributed to lots of people around the world.  But it's also possible that they could contribute their work back to the original git repo, so that some future build of the application will have those strings built-in.

I think it's a worthwhile goal that at least commercial Linux binaries ought to be self-contained and portable, so that there is no installation process beyond putting the binary in some directory which is in your path.  It wouldn't hurt if the free software community had the goal of zero-install binaries too.  The strings could be packaged as compressed resources, so the total space consumed is less.  And perhaps resources which are not needed for the current language could even be expunged from memory (at the cost that you then cannot switch languages dynamically).

But even if it were not for the enum-lookup implementation that they insisted on, our assumption that English needs no translation does not fit the multi-team workflow.  There is often an assumption that programmers cannot design UIs because they are not artistic people or because they don't have psychology degrees or haven't studied usability enough.  This is why we have separate tools for UI designers to use.  It's very similar to the way that some people assume that programmers cannot write good English either.  Stereotypes in general are wrong, and offensive to people with multi-disciplinary abilities.  Nevertheless we can probably agree that UI design and the wording of text are often subject to later revision; so it's useful not to choose a changing string as a lookup key, for the same reason that imperative code should not assume too much about the declarative structure of the UI.