[Development] Are char literals L1 or U8 in Qt?
Ivan Solovev
ivan.solovev at qt.io
Mon Jun 10 15:53:13 CEST 2024
> I would like to fix
> QASV(char) to mean QASV(QChar(char)), not redefine char literals as
> UTF-8 and break many more users (QASV is relatively new; QChar(char) and
> QString::arg(char) are there since before Qt 4).
>
> What do you think?
+1 for this proposal.
I do not think that QASV(char) providing a broken UTF-8 sequence makes sense.
We also have
QString s('\xe4'); // calls QString(QChar) c-tor, using an implicit QChar(char) c-tor
producing "ä", not an invalid UTF-8 sequence.
------------------------------
Ivan Solovev
Senior Software Engineer
The Qt Company GmbH
Erich-Thilo-Str. 10
12489 Berlin, Germany
ivan.solovev at qt.io
www.qt.io
Geschäftsführer: Mika Pälsi,
Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht
Charlottenburg, HRB 144331 B
________________________________________
From: Development <development-bounces at qt-project.org> on behalf of Marc Mutz via Development <development at qt-project.org>
Sent: Monday, June 10, 2024 2:39 PM
To: development at qt-project.org
Subject: [Development] Are char literals L1 or U8 in Qt?
Hi,
TL;DR:
- QASV(char) is UTF-8, but QChar(char) is L1
- propose to fix QASV, not QChar
- iow: char literals remain L1, not become UTF-8
- but char[] remains UTF-8
- propose to deprecate char and char[] literals for u8 and _L1 in Qt 7
(= make QT_NO_CAST_FROM_ASCII the default)
While porting QString::arg() to QAnyStringView¹, I've noticed that
QAnyStringView(char) is producing a 1-byte UTF-8 sequence (which is
invalid unless the character is from the US-ASCII subset), while
QChar(char) is producing a valid 1-codepoint UTF-16
"sequence",interpreting the ctor argument as L1.
Since QASV is supposed to make _one_ function replace all relevant
overload _sets_, incl. QChar ones², this inconsistency is creating
problems (first found by arg() test cases failing after porting to QASV).
As the original author, I can confirm that the intent was to match
whatever QChar does, so I consider the current QASV(char) behaviour to
be buggy.
OTOH, an argument can be made that, since char[] is considered UTF-8 in
Qt, so should `char`, and I think no-one is considering anything else
when it comes to he result of QUtf8StringView::first(1). But this is
about char literals.
C++ solves this by banning non-US-ASCII u8'' literals.
For Qt, my plan was to wait until we can depend on C++20's char8_t and
then eventually make QT_NO_CAST_FROM_ASCII the default (and keep u8""
and _L1 working implicitly). We will then have the same problem for
char8_t, but the standard has kinda decided for us: chat8_t is always
UTF-8, incl. single chars (but the language bans incompatible literals,
something we can't do for char, which is one of he reasons I think
QT_NO_CAST_FROM_ASCII (or, rather, _FROM_CHAR) should be the default
over the medium term).
Since there are four bugs³ in QString::arg() that are all fixed by the
existing patch chain porting the whole thing to QAnyStringView, and
since the medium-term goal is to deprecate use of char for characters
and char[] for strings (QT_ASCII_WARN), anyway, I would like to fix
QASV(char) to mean QASV(QChar(char)), not redefine char literals as
UTF-8 and break many more users (QASV is relatively new; QChar(char) and
QString::arg(char) are there since before Qt 4).
What do you think?
Thanks,
Marc
¹ chain ending in https://codereview.qt-project.org/c/qt/qtbase/+/562895
² See https://www.qt.io/blog/qstringview-diaries-qanystringview:> First,
it would need to accept anything that the overload sets above would
accept, too, to wit:
>
[...]
> QChar, or anything that implicitly converts to it (within reason; QChar's ctors are a mess)
[...]
³ to wit:
- https://bugreports.qt.io/browse/QTBUG-126053 (char8_t)
- https://bugreports.qt.io/browse/QTBUG-126054 (wchar_t)
- https://bugreports.qt.io/browse/QTBUG-126055 (qfloat16)
- https://bugreports.qt.io/browse/QTBUG-125588 (char16_t)
- and the issue at hand:
https://bugreports.qt.io/browse/QTBUG-125730 (char)
--
Marc Mutz <marc.mutz at qt.io> (he/his)
Principal Software Engineer
The Qt Company
Erich-Thilo-Str. 10 12489
Berlin, Germany
www.qt.io
Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht Charlottenburg,
HRB 144331 B
--
Development mailing list
Development at qt-project.org
https://lists.qt-project.org/listinfo/development
More information about the Development
mailing list