[Development] QStringLiteral is broken(ish) on MSVC (compiler bug?)
giuseppe.dangelo at kdab.com
Fri Mar 15 13:27:09 CET 2019
Il 14/03/19 22:48, Thiago Macieira ha scritto:
> char16_t text1 = u"" "\u0102";
> It produces, without /utf-8 (seehttps://msvc.godbolt.org/z/EvtKzq):
> ?text1@@3PA_SA DB '?', 00H, 00H, 00H ; text1
> And with /utf-8:
> ?text1@@3PA_SA DB 0c4H, 00H, 01aH, ' ', 00H, 00H ; text1
> Those two values make no sense. U+0102 is neither 0x003f (question mark) nor
> 0x00c4 0x201a ("Ä‚"). This is a clear compiler bug. An interpretation of the
> C++11 standard could say that the translation is correct for the no-/utf-8
> build, but with /utf-8 or /execution-charset:utf-8 it should have produced the
> correct result.
Actually, those values have a somehow connection with the input. Looks
like MSVC is double-encoding it:
* "\u0102" under UTF-8 execution charset produces a string containing
* that string literal is a generic narrow string literal (non prefixed).
When concatenating to a u-prefixed string literal, somehow MSVC thinks
it's in its native codepage instead of UTF-8...
* so it now reencodes 0xC4 0x82 from CP1252 to UTF-16, yielding
0x00 0xC4 0x20 0x1a, which is what ends up in text1 (fixing the endianness)
The mapping of \u escape sequences to the execution character set
happens before string literal concatenation (translation phases 5/6).
But AFAIU the mapping is purely symbolic, and has nothing to do with any
actual encoding, so MSVC is at fault here?
My 2 c,
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4007 bytes
Desc: Firma crittografica S/MIME
More information about the Development