[Development] format-like tr()

Ivan Solovev ivan.solovev at qt.io
Wed Oct 23 11:22:51 CEST 2024


Hi Thiago,

first of all, could you please clarify, how do you see the potential
implementation?
Do you think that we can reuse some of the things that std::format
provides, or do we need to write everything from scratch?

For my patches that add std::format support to QByteArray(View),
I implemented a parser that is compatible with standard format specifier for
string types, which is just a small subset of
https://eel.is/c++draft/format.string.std

And I still used std::format to do the actual formatting, because I
didn't want to mess with width and padding on my own.

> a) keep the "n" and make it mandatory, as in {n}

If we use std::format, then we cannot do it, because it does the
argument index parsing on its own, and delegates to a custom formatter
specialization after that. And argument index can only be a non-negative
integer value.

> That is, could they replace a "{1:%X}" time representation with
> "{1:%H}:{1:02%M}" ?

With std::format you can use the same argument index multiple times, so
it makes sense to allow it.

------------------------------

Ivan Solovev
Senior Software Engineer

The Qt Company GmbH
Erich-Thilo-Str. 10
12489 Berlin, Germany
ivan.solovev at qt.io
www.qt.io

Geschäftsführer: Mika Pälsi,
Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht
Charlottenburg, HRB 144331 B

________________________________________
From: Development <development-bounces at qt-project.org> on behalf of Thiago Macieira <thiago.macieira at intel.com>
Sent: Tuesday, October 22, 2024 7:58 PM
To: development at qt-project.org; Volker Krause; albert astals cid
Subject: [Development] format-like tr()

I've been pondering std::format for a while and I think for Qt we need a
different approach. Our most important use of a format()-like use would be
translated strings. I think we need to understand that before we tackle
debugging output and other non-categorised uses.

Before anything, is anyone aware of research in this area? Any conclusions we
should reuse?

1) Plural handling
The most significant difference in formatting of generic strings and of
translated strings is the handling of some plural forms. QTranslator uses %n
to denote the key number that will select the translated form. How do we mark
them in {}?

 a) keep the "n" and make it mandatory, as in {n}
 b) keep the "n" when more than one replacement is present, optional otherwise
    "{n} file(s) found: {}"
    "{} file(s) found"
 c) simply use the first argument and adopt standard std::format rules to refer
to it
    "{1}, {0} file(s) found"
    "{} file(s) found"

The advantage of {n} is that it tells the translators and the tooling that
this is a plural-handling string and requires however many different forms in
the translation. For that reason, I'd chose (a).

If we use a mandatory {n}, do other replacements also need to be numbered in
the source code?

2) Naming the arguments
One thing that can incredibly help the translators is naming the replacement
tokens, such as "{error_type}: could not connect: {error_msg}". This is
possible in the Python original because it interpolates actual variables, but
C++ does not do it. Yet the syntax is there.

Should we explore this? We could have a dummy name in the C++ code, which
won't affect the ordering at all, but could be extracted to the translation
file. In turn, the tooling could replace with numbers by matching IDs.
  trFormat("{error_type}: could not connect: {error_msg}", type, msg);
tooling extracts to .ts / .po:
  "{error_type}: could not connect: {error_msg}"
translator writes
  "Connexion échouée ({error_msg}). {error_type}"
tooling writes to .qm / .mo:
  "Connexion échouée ({1}). {0}"

3) Formatting options in translation files
Unlike QString::arg(), the formatting options for std::format are in the
format string, as in "Foo {:^10.2%EX} bar". Is there any condition under which
translators would want to change those options? I ask this because:
  a) those can confuse translators
  b) would allow us to parse the formatting specifiers at compile time (if
    possible)

I suspect the answer is that translators would never change padding,
alignment, precision or width (some of which may make no sense in translatable
strings) but may want to change some of what is getting formatted. For
example, the %EX above is used in the chrono formatter as "the locale's
alternate time representation" and translators may feel that %EX is better
than %X or vice-versa for their string. Date/time is especially important
here, because a sentence may require different grammar forms of days of the
week ("On Tuesday, Thiago wrote" versus "The alarm will happen next Tuesday").
Thus, I'd make the parsing of the padding, alignment, precision, width (PADW)
happen at compile-time, like it is done today for QString::arg(), but allow
the translators to select variants (of which QString::arg() only has one
today, "L" for locale formatting of numbers).

In the same line, are translation files allowed to duplicate the same
replacement token? If so, are they allowed to specify different padding,
alignment or precision? That is, could they replace a "{1:%X}" time
representation with "{1:%H}:{1:02%M}" ?

4) What formatting options make sense for translatable strings?
I don't think anyone should be talking about padding and alignment of strings
inside strings, especially in GUI and with variable-width fonts, none of which
the QtCore formatters can have a clue about. It's probably better to allow a
form of rich text formatting instead,  like we do now:
  trFormat("<p>Error found while copying:\n<center>{}</center>", errorString);

Or do we simply not restrict at all? Let programmers have the ire of
translators and UX designers if they use weird things that don't work in GUIs.

5) Should the same parser and same rules apply to other uses of formatting?
Aside from i18n, there would be two more cases for formatting strings inside
of Qt: logging (including debugging) and generic string formatting. Thus, code
like:
  QString fname = qFormat("{}-{%FT%T%z}.log",
          qApp->applicationName(), QDateTime::now());
  qFDebug("Opening log file {}", fname);

In the case above, we used the formatters for QString and QDateTime (whether
QDateTime would use std::chrono-like formatters or its own is besides the
point). We'd preferably have all this machinery only once in our codebase.

This is the reason I want to have this conversation now: we need to focus on
translated strings, but if we want to have the machinery only once and
behaviour-compatible, we need to know what it will be before we start
supporting std::format().

This begs the question for our string formatters: should we default to quoted
& escaped? C++23 introduces the "?" specifier for its own strings, but we could
flip the default for our loggers and instead introduce a "raw" mode. This and
discussion on how we format our own types can be had later.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Principal Engineer - Intel DCAI Platform & System Engineering


More information about the Development mailing list