[Development] Are char literals L1 or U8 in Qt?

Marc Mutz marc.mutz at qt.io
Mon Jun 10 14:39:26 CEST 2024


Hi,

TL;DR:
- QASV(char) is UTF-8, but QChar(char) is L1
- propose to fix QASV, not QChar
   - iow: char literals remain L1, not become UTF-8
     - but char[] remains UTF-8
- propose to deprecate char and char[] literals for u8 and _L1 in Qt 7
   (= make QT_NO_CAST_FROM_ASCII the default)


While porting QString::arg() to QAnyStringView¹, I've noticed that 
QAnyStringView(char) is producing a 1-byte UTF-8 sequence (which is 
invalid unless the character is from the US-ASCII subset), while 
QChar(char) is producing a valid 1-codepoint UTF-16 
"sequence",interpreting the ctor argument as L1.

Since QASV is supposed to make _one_ function replace all relevant 
overload _sets_, incl. QChar ones², this inconsistency is creating 
problems (first found by arg() test cases failing after porting to QASV).

As the original author, I can confirm that the intent was to match 
whatever QChar does, so I consider the current QASV(char) behaviour to 
be buggy.

OTOH, an argument can be made that, since char[] is considered UTF-8 in 
Qt, so should `char`, and I think no-one is considering anything else 
when it comes to he result of QUtf8StringView::first(1). But this is 
about char literals.

C++ solves this by banning non-US-ASCII u8'' literals.

For Qt, my plan was to wait until we can depend on C++20's char8_t and 
then eventually make QT_NO_CAST_FROM_ASCII the default (and keep u8"" 
and _L1 working implicitly). We will then have the same problem for 
char8_t, but the standard has kinda decided for us: chat8_t is always 
UTF-8, incl. single chars (but the language bans incompatible literals, 
something we can't do for char, which is one of he reasons I think 
QT_NO_CAST_FROM_ASCII (or, rather, _FROM_CHAR) should be the default 
over the medium term).

Since there are four bugs³ in QString::arg() that are all fixed by the 
existing patch chain porting the whole thing to QAnyStringView, and 
since the medium-term goal is to deprecate use of char for characters 
and char[] for strings (QT_ASCII_WARN), anyway, I would like to fix 
QASV(char) to mean QASV(QChar(char)), not redefine char literals as 
UTF-8 and break many more users (QASV is relatively new; QChar(char) and 
QString::arg(char) are there since before Qt 4).

What do you think?

Thanks,
Marc

¹ chain ending in https://codereview.qt-project.org/c/qt/qtbase/+/562895

² See https://www.qt.io/blog/qstringview-diaries-qanystringview:> First, 
it would need to accept anything that the overload sets above would 
accept, too, to wit:
> 
[...]
>     QChar, or anything that implicitly converts to it (within reason; QChar's ctors are a mess)
[...]

³ to wit:
- https://bugreports.qt.io/browse/QTBUG-126053 (char8_t)
- https://bugreports.qt.io/browse/QTBUG-126054 (wchar_t)
- https://bugreports.qt.io/browse/QTBUG-126055 (qfloat16)
- https://bugreports.qt.io/browse/QTBUG-125588 (char16_t)
- and the issue at hand:
   https://bugreports.qt.io/browse/QTBUG-125730 (char)

-- 
Marc Mutz <marc.mutz at qt.io> (he/his)
Principal Software Engineer

The Qt Company
Erich-Thilo-Str. 10 12489
Berlin, Germany
www.qt.io

Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht Charlottenburg,
HRB 144331 B


More information about the Development mailing list