[Interest] Using UTF-8 code page with Qt5 on Windows?

Alvin Wong alvinhochun at gmail.com
Wed May 18 14:29:41 CEST 2022


Hi,

I am considering enabling UTF-8 as the activeCodePage ^ on Windows
(supported on Windows Version 1903 and beyond) [1] for Krita to
improve our situation with using Unicode file paths when interacting
with external C/C++ libraries. As I have not found any existing
discussions on this topic, I am now investigating how Qt (5.12 in our
case) would be affected under this configuration.

I suspect that, since QString uses UTF-16 and Qt should already be
using the -W version of Windows API, it should for the most part not
affect the operations of Qt. As far as I know, the only component that
would be affected is the system QTextCodec (qwindowscodec.cpp), which
is also used by QString::fromLocal8Bit and QString::toLocal8Bit.
Because it uses WideCharToMultiByte and MultiByteToWideChar with
CP_ACP, when activeCodePage set to UTF-8, CP_ACP now uses UTF-8
instead of the system ACP (e.g. Windows-1252, Big5, Shift JIS, ...)

In theory it should just work, but when reviewing qwindowscodec.cpp I
noticed code [2] that seems like it assumes the MBCS has only two
bytes maximum per character, which is not true for UTF-8 (in which a
Unicode code point can be composed by up to 4 UTF-8 code units.) The
same code exists in Qt 6, just moved to a different location [3]. As I
am not familiar with how QTextCodec work, I cannot quite tell if this
is a real issue or not. Can anyone here give some advice?

I would also like to ask if Qt will officially support using UTF-8 as
the ACP on Windows.

Best Regards,
Alvin Wong

---

^ Note: One way of setting activeCodePage to UTF-8 is by using the
Application manifest, which will apply the option in a per-process
manner [1]. Another way is to enable it system-wide by enabling the
option "Beta: Use Unicode UTF-8 for worldwide language support" in
Region Settings [4].

[1]: https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
[2]: https://invent.kde.org/qt/qt/qtbase/-/blob/v5.12.12/src/corelib/codecs/qwindowscodec.cpp#L71-96
[3]: https://invent.kde.org/qt/qt/qtbase/-/blob/ae765813d082d403889d2f98a9c21bd9628cdd58/src/corelib/text/qstringconverter.cpp#L1247-1272
[4]: https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do


More information about the Interest mailing list