[Development] MSVC 2015 option /utf8: Qt only or everyone?
Thiago Macieira
thiago.macieira at intel.com
Tue May 10 20:30:30 CEST 2016
I've just found out that MSVC 2015 Update 2 added the compiler option /utf-8,
which is documented as:
/utf-8 set source and execution character set to UTF-8
See https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/
I'd like to turn that option on for at least all of our source code, Once we
can drop support for MSVC versions older than 2015 Update 2, we'll be able to
write proper Unicode literals in all systems.
The question is: do I turn it on for user code too? Or just ours?
See also the bottom of the blog, that says:
> In a future major release of the compiler, we would like to change default
> handling of BOM-less files to assume UTF-8
Proof:
The following file is encoded in UTF-8, as required for Qt sources.
$ cat test.cpp
extern "C" {
extern const char s[] = "Résumé";
extern const char s8[] = u8"Résumé";
extern const char16_t su[] = u"Résumé";
extern const wchar_t sl[] = L"Résumé";
}
Compiled with the /utf-8 option, the data is in the form we'd expect:
s DB 'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
ORG $+7
s8 DB 'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
ORG $+7
su DB 'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
DB 00H, 00H
ORG $+2
sl DB 'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
DB 00H, 00H
Which is the same as what we've got on Linux and OS X with Clang and GCC
(assembly listing from Clang on Linux):
s:
.asciz "R\303\251sum\303\251"
.size s, 9
.type s8, at object # @s8
.globl s8
s8:
.asciz "R\303\251sum\303\251"
.size s8, 9
.type su, at object # @su
.globl su
.align 2
su:
.short 82 # 0x52
.short 233 # 0xe9
.short 115 # 0x73
.short 117 # 0x75
.short 109 # 0x6d
.short 233 # 0xe9
.short 0 # 0x0
.size su, 14
Without that option, we had:
s DB 'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
ORG $+7
s8 DB 'R', 0c3H, 083H, 0c2H, 0a9H, 'sum', 0c3H, 083H, 0c2H, 0a9H
DB 00H
ORG $+3
su DB 'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
DB 0c3H, 00H, 0a9H, 00H, 00H, 00H
ORG $+6
sl DB 'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
DB 0c3H, 00H, 0a9H, 00H, 00H, 00H
The above contains Mojibake for the s8, su and sl variables.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list