[Interest] Using UTF-8 code page with Qt5 on Windows?

Thiago Macieira thiago.macieira at intel.com
Wed May 18 19:10:53 CEST 2022


On Wednesday, 18 May 2022 05:29:41 PDT Alvin Wong wrote:
> I am considering enabling UTF-8 as the activeCodePage ^ on Windows
> (supported on Windows Version 1903 and beyond) [1] for Krita to
> improve our situation with using Unicode file paths when interacting
> with external C/C++ libraries. As I have not found any existing
> discussions on this topic, I am now investigating how Qt (5.12 in our
> case) would be affected under this configuration.
> 
> I suspect that, since QString uses UTF-16 and Qt should already be
> using the -W version of Windows API, it should for the most part not
> affect the operations of Qt. As far as I know, the only component that
> would be affected is the system QTextCodec (qwindowscodec.cpp), which
> is also used by QString::fromLocal8Bit and QString::toLocal8Bit.
> Because it uses WideCharToMultiByte and MultiByteToWideChar with
> CP_ACP, when activeCodePage set to UTF-8, CP_ACP now uses UTF-8
> instead of the system ACP (e.g. Windows-1252, Big5, Shift JIS, ...)

Hello Alvin

Qt uses almost exclusively the W versions of the Win32 API. There are a couple 
of cases of A use, but those are the exception and you don't have to worry 
about them. Those and the "System" codec would be affected by your switch to a 
different codepage, but that's your intention anyway and I don't see a 
problem.

> In theory it should just work, but when reviewing qwindowscodec.cpp I
> noticed code [2] that seems like it assumes the MBCS has only two
> bytes maximum per character, which is not true for UTF-8 (in which a
> Unicode code point can be composed by up to 4 UTF-8 code units.) The
> same code exists in Qt 6, just moved to a different location [3]. As I
> am not familiar with how QTextCodec work, I cannot quite tell if this
> is a real issue or not. Can anyone here give some advice?

I think you're right and that code definitely looks fishy.

And it looks like we lost the tst_Utf8 test in the QStringConverter change, 
particularly tst_Utf8::charByChar, which might have caught this. Looks like 
tst_QStringConverter does not attempt to test that the system codec very well 
either, because we can't statically know what it can do. tst_Utf8 had a 
detection to see if the system codec happened to be UTF-8.

Since you're still on 5.12, tst_Utf8 (qtbase/tests/auto/codecs/utf8) is there. 
Can you try to run on with the UTF-8 codepage and see if it passes?

> I would also like to ask if Qt will officially support using UTF-8 as
> the ACP on Windows.

As far as I know, it already does. The Vietnamese locale for Windows has been 
using UTF-8 for years (probably since forever) and there's no reason that Qt 
shouldn't support it. Whether there are bugs or not is a different story, of 
course.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering





More information about the Interest mailing list