[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Thiago Macieira thiago.macieira at intel.com
Tue Apr 18 14:37:45 CEST 2023


On Tuesday, 18 April 2023 00:46:26 PDT Lars Knoll wrote:
> > But anything that goes through QIODeivce::read or write (QProcess, QFile,
> > Q{Udp,Tcp,Local}Socket) will suffer if there's no agreement on what that
> > encoding is. Usually for sockets, the protocol is binary and obviate the
> > problem. For files, some file formats help. But in particular for
> > communicating with another process, there's no reliable way.
> 
> Communicating through a socket will always require that both sides agree on
> the encoding. That’s not really anything new.
> 
> The question is how they encode the data when writing to the socket. If they
> use QTextStream, the data will by default get written in utf8 already today
> (since Qt 6.0). If they explicitly convert the QString to and from a
> specific encoding using QStringConverter/QTextCodec nothing bad will happen
> neither.
> 
> So the remaining problem comes when they use QString::to/fromLocal8Bit(), as
> that might change from some windows locale to utf8. Not a problem when
> communicating with a socket between two Qt apps, but might be an issue when
> storing data in a file or communicating with an app that doesn’t use Qt.
> 
> But we could consider that a user error, as you really shouldn’t use
> local8bit for anything else than stdin/out and interfacing with 8bit system
> APIs.

Please don't focus on sockets, as we all agree the protocol will usually 
inform what the encoding is. Instead, let's focus on QProcess.

Here's a test: write an application that displays in GUI the output of:

  QProcess proc;
  proc.start("cmd.exe", { "/c", "dir" });

This is an uncommon scenario, but it is representative of any application that 
is or simulates a terminal. If you want to have a more realistic version of 
the above, replace "dir" with "nmake" or "ninja": all three will print the 
names of files.

Conversely, write the application that keeps its output unmodified so it can be 
consumed by its current consumers.

> We did enforce it on Unix systems though with Qt 6. I do believe we can over
> time enforce it on windows as well, or at least make it the default.

In time, I agree. But we are right now where Unix was in 2003-2005, and with 
differences. For Unix systems, there's no UTF-16 API, so the equivalent 
commands of the above could afford to be encoding-agnostic, so they were a 
pass-through of what the filesystem offered. In fact, it was only Qt 
applications that had problems because we converted to UTF-16 back in 3.0 
(since 2.0) -- that is STILL a complaint we've often heard about our FS API.

> > But I think we should:
> > a) do it for our own applications, since we do know our own code
> > b) advise users somehow that they should opt-in to this
> > c) decide if we want to change from opt-in to opt-out in the medium term
> > (7.0 for example)
> > 
> > d) decide if we want to enforce it in the long-term
> > 
> > Option (d) depends on (c). Option (c) informs whether we need a Qt CMake
> > API or whether we can simply say upstream CMake should handle it.
> 
> I think this should be the goal, but I’d vote for a slightly faster
> schedule.
>
> (a) and (b) are things we should be able to do right now. All our apps work
> fine one Unix systems with a utf8 locale, so there should be relatively few
> problems doing the switch on Windows. The only thing this requires is a bit
> of cake infrastructure work (that I believe has been mostly done already),
> and some documentation for our users.
> 
> (c) is something we should also announce with a time schedule right now. I
> would go and do this either for 6.8 or 6.9 (ie with the next LTS release or
> directly afterwards). If we announce it now, it gives our users 1.5 to 2
> years to adopt (and they can always opt out afterwards).

I don't think that's realistic because I think we'll find issues. I think we 
need to do the conversion of our own applications and tools first, figure out 
what the issues are for ourselves, before we make time promises.

I expect we'll need more than 1.5 year of advance notice that the opt-in will 
change to opt-out.

> (d) is something I would do for Qt 7, as that’s the correct time to do those
> changes and clean up our code base

I also think it's unrealistic for the same reason. That's a 4-6 year leniency, 
for something that Unix took 17 and had a single system-wide encoding (Windows 
has three).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Cloud Software Architect - Intel DCAI Cloud Engineering
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5152 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/development/attachments/20230418/46e18952/attachment.bin>


More information about the Development mailing list