[Development] std::format support for Qt string types
    Ivan Solovev 
    ivan.solovev at qt.io
       
    Wed Jun  5 15:18:33 CEST 2024
    
    
  
Hi,
I'm now working on introducing std::format support for some of the Qt types.
I decided to start with the variety of Qt string types, and I have some open
question regarding the implementation that I want to discuss.
First, I'd like to give a very short summary of my understanding of how
std::format works in plain C++ when it comes to string formatting.
Basically, we have two types of of formatters:
* std::formatter<T, char> that handle std::string, const char *, and
 const char (&)[N] overloads.
* std::formatter<T, wchar_t> that handle std::wstring, const wchar_t *,
 and const wchar_t (&)[N] overloads.
The encoding for the wide char strings is usually known - it's either UTF-16
on Windows or UTF-32 on Linux and macOS.
But what is the encoding for the char strings? The answer is that std::format
does not care. It just tries to format the characters according to the format
string. What you see in the terminal fully depends on your terminal encoding.
So, back to the main question. How we should format Qt string types?
The support for wide char formatters is straightforward - we can use
QString::toStdWString() and be sure that we do not get any unreadable
characters in the formatted output.
I already have a WIP patch implementing it [0].
But what to do with the char formatters? Should we aim for the formatted
strings to be always readable, or should we just not care, like the
std::formatter<char> does?
I see several options here:
1. Treat everything as UTF-8
Traditionally all QString(View) constructors taking char arrays or std::string
treat the data as UTF-8. Also, QString::toStdString() provides a UTF-8 encoded
std::string. So this would be sort of an expected behavior for Qt users.
With this approach QLatin1StringView should also be converted to UTF-8 before
being processed by the formatter.
2. Treat everything as Local8Bit
Basically similar to the previous approach, but use toLocal8Bit() instead of
toUtf8() when passing the data to the formatter. On Linux and macOS that would
actually be equivalent to the first approach, because toLocal8Bit() simply
assumes UTF-8 as an encoding. On Windows it would use CP_ACP to do the
conversion.
In this case the behavior would be similar to what qDebug() does.
The drawback is that the formatted string might be different from the original
one. For example, `Ü` might be replaced with `U`, some other symbols might be
replaced with `?`, depending on the currently selected code page.
Similarly to the previous option, QLatin1StringView and QUtf8StringView should
also be converted to Local8Bit before formatting.
3. Try to not guess the encoding for the user
Basically, for QUtf8StringView and QLatin1StringView their encoding is
explicitly mentioned in the names of the classes, so we can just consider that
if the users use these classes with std::format, they expect to have UTF-8
or Latin1 output respectively.
Question here is how to deal with QString(View)?
 3a. Convert it to UTF-8, because that's the pre-existing behavior which
     should be known for the users.
 3b. Do not implement std::formatter<QString(View), char> at all and let
     the users explicitly convert QString to something else first.
Option 3b is inconvenient and defeats the purpose of std::format support
for Qt types, so I'd personally prefer 3a here.
The WIP patch [1] now implements approach 2, but I'm actually leaning
towards updating it to approach 3 (with 3a for QString(View)).
[0]: https://codereview.qt-project.org/c/qt/qtbase/+/563859
[1]: https://codereview.qt-project.org/c/qt/qtbase/+/559758
I'd like to hear more opinions on how to proceed here, so please
share your ideas!
Best regards,
Ivan
------------------------------
Ivan Solovev
Senior Software Engineer
The Qt Company GmbH
Erich-Thilo-Str. 10
12489 Berlin, Germany
ivan.solovev at qt.io
www.qt.io
Geschäftsführer: Mika Pälsi,
Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht
Charlottenburg, HRB 144331 B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20240605/31edc4d8/attachment-0001.htm>
    
    
More information about the Development
mailing list