[Development] MSVC 2015 option /utf8: Qt only or everyone?

Thiago Macieira thiago.macieira at intel.com
Tue May 10 20:30:30 CEST 2016


I've just found out that MSVC 2015 Update 2 added the compiler option /utf-8, 
which is documented as:

	/utf-8 set source and execution character set to UTF-8

See https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/

I'd like to turn that option on for at least all of our source code, Once we 
can drop support for MSVC versions older than 2015 Update 2, we'll be able to 
write proper Unicode literals in all systems.

The question is: do I turn it on for user code too? Or just ours?

See also the bottom of the blog, that says:
> In a future major release of the compiler, we would like to change default
> handling of BOM-less files to assume UTF-8

Proof:

The following file is encoded in UTF-8, as required for Qt sources.

$ cat test.cpp
extern "C" {
extern const char s[] = "Résumé";
extern const char s8[] = u8"Résumé";
extern const char16_t su[] = u"Résumé";
extern const wchar_t sl[] = L"Résumé";
}

Compiled with the /utf-8 option, the data is in the form we'd expect:
s       DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
        ORG $+7
s8      DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
        ORG $+7
su      DB      'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
        DB      00H, 00H
        ORG $+2
sl      DB      'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
        DB      00H, 00H

Which is the same as what we've got on Linux and OS X with Clang and GCC 
(assembly listing from Clang on Linux):

s:
        .asciz  "R\303\251sum\303\251"
        .size   s, 9

        .type   s8, at object              # @s8
        .globl  s8
s8:
        .asciz  "R\303\251sum\303\251"
        .size   s8, 9

        .type   su, at object              # @su
        .globl  su
        .align  2
su:
        .short  82                      # 0x52
        .short  233                     # 0xe9
        .short  115                     # 0x73
        .short  117                     # 0x75
        .short  109                     # 0x6d
        .short  233                     # 0xe9
        .short  0                       # 0x0
        .size   su, 14

Without that option, we had:

s       DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
        ORG $+7
s8      DB      'R', 0c3H, 083H, 0c2H, 0a9H, 'sum', 0c3H, 083H, 0c2H, 0a9H
        DB      00H
        ORG $+3
su      DB      'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
        DB      0c3H, 00H, 0a9H, 00H, 00H, 00H
        ORG $+6
sl      DB      'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
        DB      0c3H, 00H, 0a9H, 00H, 00H, 00H

The above contains Mojibake for the s8, su and sl variables.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Development mailing list