[Development] format-like tr()

Thiago Macieira thiago.macieira at intel.com
Tue Oct 22 19:58:59 CEST 2024


I've been pondering std::format for a while and I think for Qt we need a 
different approach. Our most important use of a format()-like use would be 
translated strings. I think we need to understand that before we tackle 
debugging output and other non-categorised uses.

Before anything, is anyone aware of research in this area? Any conclusions we 
should reuse?

1) Plural handling
The most significant difference in formatting of generic strings and of 
translated strings is the handling of some plural forms. QTranslator uses %n 
to denote the key number that will select the translated form. How do we mark 
them in {}?

 a) keep the "n" and make it mandatory, as in {n}
 b) keep the "n" when more than one replacement is present, optional otherwise
    "{n} file(s) found: {}"
    "{} file(s) found"
 c) simply use the first argument and adopt standard std::format rules to refer 
to it
    "{1}, {0} file(s) found"
    "{} file(s) found"

The advantage of {n} is that it tells the translators and the tooling that 
this is a plural-handling string and requires however many different forms in 
the translation. For that reason, I'd chose (a).

If we use a mandatory {n}, do other replacements also need to be numbered in 
the source code?

2) Naming the arguments
One thing that can incredibly help the translators is naming the replacement 
tokens, such as "{error_type}: could not connect: {error_msg}". This is 
possible in the Python original because it interpolates actual variables, but 
C++ does not do it. Yet the syntax is there.

Should we explore this? We could have a dummy name in the C++ code, which 
won't affect the ordering at all, but could be extracted to the translation 
file. In turn, the tooling could replace with numbers by matching IDs.
  trFormat("{error_type}: could not connect: {error_msg}", type, msg);
tooling extracts to .ts / .po:
  "{error_type}: could not connect: {error_msg}"
translator writes
  "Connexion échouée ({error_msg}). {error_type}"
tooling writes to .qm / .mo:
  "Connexion échouée ({1}). {0}"

3) Formatting options in translation files
Unlike QString::arg(), the formatting options for std::format are in the 
format string, as in "Foo {:^10.2%EX} bar". Is there any condition under which 
translators would want to change those options? I ask this because:
  a) those can confuse translators
  b) would allow us to parse the formatting specifiers at compile time (if 
    possible)

I suspect the answer is that translators would never change padding, 
alignment, precision or width (some of which may make no sense in translatable 
strings) but may want to change some of what is getting formatted. For 
example, the %EX above is used in the chrono formatter as "the locale's 
alternate time representation" and translators may feel that %EX is better 
than %X or vice-versa for their string. Date/time is especially important 
here, because a sentence may require different grammar forms of days of the 
week ("On Tuesday, Thiago wrote" versus "The alarm will happen next Tuesday"). 
Thus, I'd make the parsing of the padding, alignment, precision, width (PADW) 
happen at compile-time, like it is done today for QString::arg(), but allow 
the translators to select variants (of which QString::arg() only has one 
today, "L" for locale formatting of numbers).

In the same line, are translation files allowed to duplicate the same 
replacement token? If so, are they allowed to specify different padding, 
alignment or precision? That is, could they replace a "{1:%X}" time 
representation with "{1:%H}:{1:02%M}" ?

4) What formatting options make sense for translatable strings?
I don't think anyone should be talking about padding and alignment of strings 
inside strings, especially in GUI and with variable-width fonts, none of which 
the QtCore formatters can have a clue about. It's probably better to allow a 
form of rich text formatting instead,  like we do now:
  trFormat("<p>Error found while copying:\n<center>{}</center>", errorString);

Or do we simply not restrict at all? Let programmers have the ire of 
translators and UX designers if they use weird things that don't work in GUIs.

5) Should the same parser and same rules apply to other uses of formatting?
Aside from i18n, there would be two more cases for formatting strings inside 
of Qt: logging (including debugging) and generic string formatting. Thus, code 
like:
  QString fname = qFormat("{}-{%FT%T%z}.log", 
          qApp->applicationName(), QDateTime::now());
  qFDebug("Opening log file {}", fname);

In the case above, we used the formatters for QString and QDateTime (whether 
QDateTime would use std::chrono-like formatters or its own is besides the 
point). We'd preferably have all this machinery only once in our codebase.

This is the reason I want to have this conversation now: we need to focus on 
translated strings, but if we want to have the machinery only once and 
behaviour-compatible, we need to know what it will be before we start 
supporting std::format().

This begs the question for our string formatters: should we default to quoted 
& escaped? C++23 introduces the "?" specifier for its own strings, but we could 
flip the default for our loggers and instead introduce a "raw" mode. This and 
discussion on how we format our own types can be had later.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Principal Engineer - Intel DCAI Platform & System Engineering
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5152 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/development/attachments/20241022/f9e66d2c/attachment.bin>


More information about the Development mailing list