[Development] format-like tr()

Mårten Nordheim marten.nordheim at qt.io
Thu Oct 24 13:11:13 CEST 2024


Hello!

I'm not sure {n} is great, it would clash with any potential name-matching.
If we anyway keep the formatting specification in the string then it would make
more sense to keep it in there. E.g. "{:#}" or something similar.
If the translation moves the {...} into a different order they would need to
number the values.
Of course, if we use std::format directly then we couldn't do it for types we
don't control, but nothing would stop us from adding something like
qTrFmt("...", ....); that passes every argument (at least integers in this case)
through a wrapper that we control.

(When I was still poking at QStringFormatter I was doing named arguments,
but implemented as a templated struct you could pass in as a 'special' argument.
e.g., QStringFormatter("{a:L}").arg(QStringFormatter::qArg("a", 15.5) but overly wordy)

Mårten

> -----Original Message-----
> From: Development <development-bounces at qt-project.org> On Behalf Of
> Thiago Macieira
> Sent: tirsdag 22. oktober 2024 19:59
> To: development at qt-project.org; Volker Krause <vkrause at kde.org>; albert
> astals cid <aacid at kde.org>
> Subject: [Development] format-like tr()
> 
> I've been pondering std::format for a while and I think for Qt we need a
> different approach. Our most important use of a format()-like use would be
> translated strings. I think we need to understand that before we tackle
> debugging output and other non-categorised uses.
> 
> Before anything, is anyone aware of research in this area? Any conclusions we
> should reuse?
> 
> 1) Plural handling
> The most significant difference in formatting of generic strings and of
> translated strings is the handling of some plural forms. QTranslator uses %n
> to denote the key number that will select the translated form. How do we
> mark
> them in {}?
> 
>  a) keep the "n" and make it mandatory, as in {n}
>  b) keep the "n" when more than one replacement is present, optional
> otherwise
>     "{n} file(s) found: {}"
>     "{} file(s) found"
>  c) simply use the first argument and adopt standard std::format rules to refer
> to it
>     "{1}, {0} file(s) found"
>     "{} file(s) found"
> 
> The advantage of {n} is that it tells the translators and the tooling that
> this is a plural-handling string and requires however many different forms in
> the translation. For that reason, I'd chose (a).
> 
> If we use a mandatory {n}, do other replacements also need to be numbered
> in
> the source code?
> 
> 2) Naming the arguments
> One thing that can incredibly help the translators is naming the replacement
> tokens, such as "{error_type}: could not connect: {error_msg}". This is
> possible in the Python original because it interpolates actual variables, but
> C++ does not do it. Yet the syntax is there.
> 
> Should we explore this? We could have a dummy name in the C++ code,
> which
> won't affect the ordering at all, but could be extracted to the translation
> file. In turn, the tooling could replace with numbers by matching IDs.
>   trFormat("{error_type}: could not connect: {error_msg}", type, msg);
> tooling extracts to .ts / .po:
>   "{error_type}: could not connect: {error_msg}"
> translator writes
>   "Connexion échouée ({error_msg}). {error_type}"
> tooling writes to .qm / .mo:
>   "Connexion échouée ({1}). {0}"
> 
> 3) Formatting options in translation files
> Unlike QString::arg(), the formatting options for std::format are in the
> format string, as in "Foo {:^10.2%EX} bar". Is there any condition under which
> translators would want to change those options? I ask this because:
>   a) those can confuse translators
>   b) would allow us to parse the formatting specifiers at compile time (if
>     possible)
> 
> I suspect the answer is that translators would never change padding,
> alignment, precision or width (some of which may make no sense in
> translatable
> strings) but may want to change some of what is getting formatted. For
> example, the %EX above is used in the chrono formatter as "the locale's
> alternate time representation" and translators may feel that %EX is better
> than %X or vice-versa for their string. Date/time is especially important
> here, because a sentence may require different grammar forms of days of the
> week ("On Tuesday, Thiago wrote" versus "The alarm will happen next
> Tuesday").
> Thus, I'd make the parsing of the padding, alignment, precision, width (PADW)
> happen at compile-time, like it is done today for QString::arg(), but allow
> the translators to select variants (of which QString::arg() only has one
> today, "L" for locale formatting of numbers).
> 
> In the same line, are translation files allowed to duplicate the same
> replacement token? If so, are they allowed to specify different padding,
> alignment or precision? That is, could they replace a "{1:%X}" time
> representation with "{1:%H}:{1:02%M}" ?
> 
> 4) What formatting options make sense for translatable strings?
> I don't think anyone should be talking about padding and alignment of strings
> inside strings, especially in GUI and with variable-width fonts, none of which
> the QtCore formatters can have a clue about. It's probably better to allow a
> form of rich text formatting instead,  like we do now:
>   trFormat("<p>Error found while copying:\n<center>{}</center>",
> errorString);
> 
> Or do we simply not restrict at all? Let programmers have the ire of
> translators and UX designers if they use weird things that don't work in GUIs.
> 
> 5) Should the same parser and same rules apply to other uses of formatting?
> Aside from i18n, there would be two more cases for formatting strings inside
> of Qt: logging (including debugging) and generic string formatting. Thus, code
> like:
>   QString fname = qFormat("{}-{%FT%T%z}.log",
>           qApp->applicationName(), QDateTime::now());
>   qFDebug("Opening log file {}", fname);
> 
> In the case above, we used the formatters for QString and QDateTime
> (whether
> QDateTime would use std::chrono-like formatters or its own is besides the
> point). We'd preferably have all this machinery only once in our codebase.
> 
> This is the reason I want to have this conversation now: we need to focus on
> translated strings, but if we want to have the machinery only once and
> behaviour-compatible, we need to know what it will be before we start
> supporting std::format().
> 
> This begs the question for our string formatters: should we default to quoted
> & escaped? C++23 introduces the "?" specifier for its own strings, but we
> could
> flip the default for our loggers and instead introduce a "raw" mode. This and
> discussion on how we format our own types can be had later.
> 
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
>   Principal Engineer - Intel DCAI Platform & System Engineering


More information about the Development mailing list