[Interest] cross-language matching of QStrings: partial success only?

Till Oliver Knoll till.oliver.knoll at gmail.com
Mon May 11 23:00:36 CEST 2015



> Am 11.05.2015 um 20:37 schrieb Allan Sandfeld Jensen <kde at carewolf.com>:
> 
>> On Monday 11 May 2015, Thiago Macieira wrote:
>>> On Monday 11 May 2015 19:07:31 René J.V. Bertin wrote:
>>> So apparently the string comparison does not do translation (which isn't
>>> really a surprise except when you look at the context in which I'm
>>> working here...).
>> 
>> How would the context change anything? String comparisons do a
>> character-by- character comparison. If any character is different, the
>> string won't match. They have nothing to do with translations.
>> 
>>> It seems the best option would be to ensure that cstyle and istyle are
>>> both in the same language. For that, I think I'd need to know the
>>> language in which the elements of "styles" are stored, and then
>>> translate "style" (cstyle) into that language.
>>> 
>>> How does one do that?
>> 
>> I have neither understood your objective nor your solution. I read your
>> email three times and I can't get either.
>> 
>> But my suggestion is that you restrict yourself to exact string matches.
>> Don't do partials. Either you have the complete translation to be matched
>> or you don't.
>> 
>> Better yet, don't compare translations. Compare originals only.
> 
> I think the problem is that fonts not always named in English and often uses 
> strings instead of defined enums. This means they can have their style names 
> in the language of the font origin or the language of the operating system. 

It's been a looooong time since I last fiddled around with parsing TTF file headers.

From what I see now in the OTF spec:

  https://www.microsoft.com/typography/otspec/name.htm

you have "name records" which can contain all sort of information: font family, sub-family, copyright, ...

Each name record is associated with an Encoding ID and Language ID. It would be no fun if those IDs would be interpreted the same across platforms (that's why we have cross-platform toolkits such as Qt to deal with that mess ;)): that's where the Platform ID comes into play (1 = Macintosh, 3 = Windows etc.). Off course there are also platform-independent Language Tag records for Language ID values >= 0x8000, where Language tags are like "en", "fr-CA".

It goes without saying that Language Tags may or may not be supported on a given platform. The same goes for "platform specific" Language IDs: so a Windows platform may or may not understand the Macintosh Language IDs and vice versa.

I mean, where would the fun be!

The Name ID finally denotes the meaning of the name (in the language and encoding that you may have figured out by now): 0 = Copyright, 1 = Font Family, 2 = Font Subfamily etc.

It is my understanding that it is up to the font manufacturer which Names to provide in which language ("To keep this table short, the font manufacturer may wish to make a limited set of entries in some small set of languages; later, the font can be “localized” and the strings translated or added.").

So if you are lucky you always have the "english names" available (it faintly rings a bell that I once saw an "englishFontFamily" function in the win32 API) - if not, well, you might have to deal with "French Fonts" ;)

But it really seems to be in the spirit of the inventor that you lookup a specific font by a combination of Platform/Encoding/Language/Name IDs: "Clients that need a particular string can look it up by its platform ID, character encoding ID, language ID and name ID. Note that some platforms may require single byte character strings, while others may require double byte strings."

All said, I don't even know how Qt deals with that: e.g. does QFontInfo::family return the english name, if available? Or the "localised name" depending on the system language? Or "first found, first returned" (aka "undefined behaviour")?

I don't think you could solve a "font matching" problem on the Qt API layer - not without knowing the "language ID" of the returned font family name.

But IF you have that, you could do a heuristic matching by:

- always compare english values
- if not, translate into english values first by
- having a predefined set of "supported languages" for typical font names and styles: "bold", "fett", "gras" etc
- Have fun!

;)

Cheers,
  Oliver


More information about the Interest mailing list