[Development] MSVC 2015 option /utf8: Qt only or everyone?

Lars Knoll Lars.Knoll at qt.io
Wed May 11 09:59:38 CEST 2016


Finally!

We should certainly turn it on for Qt. 

User code is a bit more sensitive. What are we currently doing on the other compilers? Are we assuming utf8 as the input encoding by default? If yes, we should aim for consistency and turn it on for user code on msvc2015 as well, but there should be an option to disable it and we'd need to clearly document this in the Changelog. 





Cheers,
Lars

On 10/05/16 20:30, "Development on behalf of Thiago Macieira" <development-bounces+lars.knoll=qt.io at qt-project.org on behalf of thiago.macieira at intel.com> wrote:

>I've just found out that MSVC 2015 Update 2 added the compiler option /utf-8, 
>which is documented as:
>
>	/utf-8 set source and execution character set to UTF-8
>
>See https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/
>
>I'd like to turn that option on for at least all of our source code, Once we 
>can drop support for MSVC versions older than 2015 Update 2, we'll be able to 
>write proper Unicode literals in all systems.
>
>The question is: do I turn it on for user code too? Or just ours?
>
>See also the bottom of the blog, that says:
>> In a future major release of the compiler, we would like to change default
>> handling of BOM-less files to assume UTF-8
>
>Proof:
>
>The following file is encoded in UTF-8, as required for Qt sources.
>
>$ cat test.cpp
>extern "C" {
>extern const char s[] = "Résumé";
>extern const char s8[] = u8"Résumé";
>extern const char16_t su[] = u"Résumé";
>extern const wchar_t sl[] = L"Résumé";
>}
>
>Compiled with the /utf-8 option, the data is in the form we'd expect:
>s       DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
>        ORG $+7
>s8      DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
>        ORG $+7
>su      DB      'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
>        DB      00H, 00H
>        ORG $+2
>sl      DB      'R', 00H, 0e9H, 00H, 's', 00H, 'u', 00H, 'm', 00H, 0e9H, 00H
>        DB      00H, 00H
>
>Which is the same as what we've got on Linux and OS X with Clang and GCC 
>(assembly listing from Clang on Linux):
>
>s:
>        .asciz  "R\303\251sum\303\251"
>        .size   s, 9
>
>        .type   s8, at object              # @s8
>        .globl  s8
>s8:
>        .asciz  "R\303\251sum\303\251"
>        .size   s8, 9
>
>        .type   su, at object              # @su
>        .globl  su
>        .align  2
>su:
>        .short  82                      # 0x52
>        .short  233                     # 0xe9
>        .short  115                     # 0x73
>        .short  117                     # 0x75
>        .short  109                     # 0x6d
>        .short  233                     # 0xe9
>        .short  0                       # 0x0
>        .size   su, 14
>
>Without that option, we had:
>
>s       DB      'R', 0c3H, 0a9H, 'sum', 0c3H, 0a9H, 00H
>        ORG $+7
>s8      DB      'R', 0c3H, 083H, 0c2H, 0a9H, 'sum', 0c3H, 083H, 0c2H, 0a9H
>        DB      00H
>        ORG $+3
>su      DB      'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
>        DB      0c3H, 00H, 0a9H, 00H, 00H, 00H
>        ORG $+6
>sl      DB      'R', 00H, 0c3H, 00H, 0a9H, 00H, 's', 00H, 'u', 00H, 'm', 00H
>        DB      0c3H, 00H, 0a9H, 00H, 00H, 00H
>
>The above contains Mojibake for the s8, su and sl variables.
>-- 
>Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel Open Source Technology Center
>
>_______________________________________________
>Development mailing list
>Development at qt-project.org
>http://lists.qt-project.org/mailman/listinfo/development


More information about the Development mailing list