[Development] std::format support for Qt string types
Giuseppe D'Angelo
giuseppe.dangelo at kdab.com
Thu Jun 6 17:07:31 CEST 2024
Hello,
On 05/06/2024 15:18, Ivan Solovev via Development wrote:
> Hi,
>
> I'm now working on introducing std::format support for some of the Qt types.
> I decided to start with the variety of Qt string types, and I have some open
> question regarding the implementation that I want to discuss.
>
> First, I'd like to give a very short summary of my understanding of how
> std::format works in plain C++ when it comes to string formatting.
> Basically, we have two types of of formatters:
> * std::formatter<T, char> that handle std::string, const char *, and
> const char (&)[N] overloads.
> * std::formatter<T, wchar_t> that handle std::wstring, const wchar_t *,
> and const wchar_t (&)[N] overloads.
>
> The encoding for the wide char strings is usually known - it's either UTF-16
> on Windows or UTF-32 on Linux and macOS.
> But what is the encoding for the char strings? The answer is that
> std::format
> does not care. It just tries to format the characters according to the
> format
> string. What you see in the terminal fully depends on your terminal
> encoding.
I think we should conceptually separate formatting from printing on a
terminal. std::format isn't _just_ for printing on terminals (we now
have std::print for that). Having said it, I admit that I've fallen
quite behind
> So, back to the main question. How we should format Qt string types?
Just for the sake of discussion, we can also leave the problem unsolved
until std::format works with Unicode strings. As much as that's a pain
point for users, we won't paint ourselves in a corner (see below).
> The support for wide char formatters is straightforward - we can use
> QString::toStdWString() and be sure that we do not get any unreadable
> characters in the formatted output.
> I already have a WIP patch implementing it [0].
In general, I'm not too fond of the idea that we need to re-encode
strings (= allocations) in order to format them, but I don't see an easy
way out given the tools at our disposal...
> But what to do with the char formatters? Should we aim for the formatted
> strings to be always readable, or should we just not care, like the
> std::formatter<char> does?
What do you mean by "readable" here?
> I see several options here:
>
> 1. Treat everything as UTF-8
>
> Traditionally all QString(View) constructors taking char arrays or
> std::string
> treat the data as UTF-8. Also, QString::toStdString() provides a UTF-8
> encoded
> std::string. So this would be sort of an expected behavior for Qt users.
>
> With this approach QLatin1StringView should also be converted to UTF-8
> before
> being processed by the formatter.
That sounds definitely appealing, in the sense that in any text-based
APIs, we expect `char` to be UTF-8. So, formatting into chars means
formatting into UTF-8.
> 2. Treat everything as Local8Bit
>
> Basically similar to the previous approach, but use toLocal8Bit() instead of
> toUtf8() when passing the data to the formatter. On Linux and macOS that
> would
> actually be equivalent to the first approach, because toLocal8Bit() simply
> assumes UTF-8 as an encoding. On Windows it would use CP_ACP to do the
> conversion.
>
> In this case the behavior would be similar to what qDebug() does.
Again, I'm not really sure of entangling consoles with this.
If you go for this approach and std::print a QString on Windows, what
kind of output do you get?
> The drawback is that the formatted string might be different from the
> original
> one. For example, `Ü` might be replaced with `U`, some other symbols
> might be
> replaced with `?`, depending on the currently selected code page.
>
> Similarly to the previous option, QLatin1StringView and QUtf8StringView
> should
> also be converted to Local8Bit before formatting.
>
> 3. Try to not guess the encoding for the user
>
> Basically, for QUtf8StringView and QLatin1StringView their encoding is
> explicitly mentioned in the names of the classes, so we can just
> consider that
> if the users use these classes with std::format, they expect to have UTF-8
> or Latin1 output respectively.
I'm not following this. If I do
std::format("{} {}", utf8string, latin1string)
what am I supposed to get out? A string which is a mix of two different
encodings? I don't think that's ever possibly wanted.
>
> Question here is how to deal with QString(View)?
> 3a. Convert it to UTF-8, because that's the pre-existing behavior which
> should be known for the users.
> 3b. Do not implement std::formatter<QString(View), char> at all and let
> the users explicitly convert QString to something else first.
>
> Option 3b is inconvenient and defeats the purpose of std::format support
> for Qt types, so I'd personally prefer 3a here.
The concern I was quoting before is this: suppose that tomorrow we have
a formatter for `const char16_t *` into char. This formatter does some
kind of transcoding. Then QString(View) ought to do precisely the same!
If we take a different decision now, we risk having compatibility
problems down the line.
Now, I don't really know if formatting char16_t is anywhere on SG16's
radar in the short term, but that sounds definitely something to
investigate and report about, in order to make a more informed decision.
(Not to mention formatting _into_ char16_t, which would unlock something
like QString::format to *create* a QString!)
Thanks,
--
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - Trusted Software Excellence
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4244 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.qt-project.org/pipermail/development/attachments/20240606/9bd883d0/attachment.bin>
More information about the Development
mailing list