[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Thiago Macieira thiago.macieira at intel.com
Mon Apr 17 18:16:42 CEST 2023


On Monday, 20 March 2023 08:44:30 CDT Edward Welbourne wrote:
> Thiago Macieira (31 October 2019 22:11) wrote [0]:
> > This RFC (...) is meant to discuss how we'll deal with locales on Unix
> > systems on Qt 6. This does not apply to Windows because on Windows we
> > cannot reasonably be expected to use UTF-8 for the 8-bit encoding.
> 
> [0]
> https://lists.qt-project.org/pipermail/development/2019-October/037791.html
> 
> The GNU make mailing list currently has a thread (starts at [1]) about
> handling of encodings on Windows.
> 
> [1] https://lists.gnu.org/archive/html/bug-make/2023-03/msg00066.html
> 
> The discussion there seems to indicate that setting the system code-page
> to UTF-8 can be done in a way that interoperates gracefully with other
> processes and the file system, presumably thanks to the system being
> substantially UTF-16-based, so all 8-bit encodings go via that anyway.

That only works for the file names, not the file contents and other channels. 
For QProcess, we're slightly fortunate that we have UTF-16 API, so the 
encoding that the other application uses for its command-line is irrelevant 
for us.

But anything that goes through QIODeivce::read or write (QProcess, QFile, 
Q{Udp,Tcp,Local}Socket) will suffer if there's no agreement on what that 
encoding is. Usually for sockets, the protocol is binary and obviate the 
problem. For files, some file formats help. But in particular for communicating 
with another process, there's no reliable way.

> The means to achieve this appear [2] to hinge on setting the active
> codepage for the application in a manifest file, that it gets combined
> with after it is linked.
> 
> [2]
> https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-> code-page

That was already known at the time, in 2019. What has changed is that the 
Windows API has matured to the point that this is now a viable choice 
(previously, it was experimental and known to cause issues). But it's still an 
application choice; we can't enforce it.

> There do appear to be some vagaries still, it may depend on UCRT and I'm
> not sure I've really understood it all, but it looks like we may, in
> time, be able to consistently use UTF-8 as 8-bit encoding on Windows.

Sorry, no, we can't force users to do it because we don't know if their code 
is safe.

But I think we should:
a) do it for our own applications, since we do know our own code
b) advise users somehow that they should opt-in to this
c) decide if we want to change from opt-in to opt-out in the medium term (7.0 
  for example)
d) decide if we want to enforce it in the long-term

Option (d) depends on (c). Option (c) informs whether we need a Qt CMake API 
or whether we can simply say upstream CMake should handle it.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5152 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/development/attachments/20230417/db731548/attachment-0001.bin>


More information about the Development mailing list