[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

André Pönitz apoenitz at t-online.de
Sat Nov 16 19:50:13 CET 2019

On Fri, Nov 15, 2019 at 05:47:04PM -0800, Thiago Macieira wrote:
> On Friday, 15 November 2019 16:23:24 PST André Pönitz wrote:
> > > The questions are:
> > > 1) do we want to prevent another library from accidentally unsetting it?
> > > 2) do we want child processes to use the same?
> > > 
> > > Note the answers for both questions must be the same, for the solution is
> > > the same. So either both yeses or both nos.
> > 
> > This "answers for both questions must be the same" requirement is arbitrary.
> > 
> > The fact that one known solution results in same answers to both is in
> > no way proof that no other solutions exist.
> I don't see how to prevent another library doing setlocale(LC_ALL, "") from 
> not overriding Qt's default other than to make setlocale(LC_ALL, "") do what 
> we want. Since what it does is read the environment, the only solution is to 
> change the environment.

You haven't even explained why this prevention would be needed, what exact
bad would happen if you don't do that, and you cannot prevent the other library
from setting an explicit locale anyway.

With modifying the environment, you just catch the "" case, one out of many,
and I'll continue to argue that it's not Qt's business to try even that.

> > > Qt 6 will not have support for non-UTF-8 codecs, outside of Windows. You
> > > can either deal with binary data or with UTF-8 text, there's no middle
> > > ground.
> > Now that's an interesting twist.
> > 
> > The latest memo I did (not...) get was that codecs are to be moved into a
> > separate module. Which is actually ok, as it allows user code using codecs
> > to live on with minimal changes, and makes QtCore slimmer, kind of "no-loss
> > + win".
> Sure. But that's no different than using ICU or writing your own code to 
> convert from binary to text. QString will not support it on its own.

> > "Qt 6 will not have support for non-UTF-8 codecs, outside of Windows" is
> > definitely news to me. I've not seen this being discussed, neither here nor
> > within the part of the company that I usually talk to.
> You just said yourself, above.

I did not say that.

> If QTextCodec moves to another library, we have  no codecs in QtCore.

Not having codecs in QtCore does not mean QtCore cannot use codecs.

One could have a setup where Qt Core just has the bare minimum, with stubs
for other codecs that are used when that QtCodecs lib is linked.

Actually that's what I had expected something like that to be the targeted
solution once I heard that text codecs move out of QtCore.

> > So when and where was this decision made, by whom, and why?
> > 
> > Did that person bother to check e.g. whether Qt Creator uses non-UTF-8
> > codecs in some cases and did that person come to the conclusion that any
> > such use is bad and deserves to die?
> Probably not. Why does Qt Creator need other codecs?

My guess would be to handle code bases that are not (a subset) of UTF-8.
> > > you're arguing that here are broken applications that won't handle
> > > C.UTF-8 correctly, without giving as single example.
> > 
> > ... is of course not true:
> > 
> > 1. I did not claim there were "broken" applications that won't handle
> >    C.UTF-8 "correctly", I claimed that there are applications that react
> >    differently to C.UTF-8.
> Different behaviour is *exactly* what we want. We want this:

Who is 'we'?

> $ LC_ALL=C.UTF-8 ls á
> ls: cannot access 'á': No such file or directory
> not this:
> $ LC_ALL=C ls á
> ls: cannot access ''$'\303\241': No such file or directory

If you do not touch the environment, the user gets what he asked for.

He will most likely want not to see ''$'\303\241, but if he explicitly asks
for it in the environment he sets up, it's not Qt's job to override this.

> I thought the argument would be that despite being what we wanted,

Who is 'we'?

> it would break certain scenarios. But I haven't seen any examples of breakage.
> >      gcc produces different output under C and C.UTF-8:
> > 
> >      echo x | LC_CTYPE=C gcc -xc -
> >       <stdin>:1:1: error: expected '=', ',', ';', 'asm' or '__attribute__'
> > at end of input
> > 
> >      echo x | LC_CTYPE=C.UTF-8 gcc -xc -
> >       <stdin>:1:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
> > at end of input
> > 
> >      As an additional twist, this different behaviour does not require fancy
> > input, input is plain ASCII in both cases.
> > 
> >      Output parsers expecting "'" e.g. to produce a set recommendations how
> > to quick-fix such problems in an IDE will break.
> Any application that is parsing GCC output is already setting LC_ALL in the 
> child process's environment.

Not necessarily, and if so, it's rather 'C', not 'C.UTF-8'.

> Otherwise, they'd be getting possibly translated 
> messages and we all know that the order of the messages could be different. 
> Not to mention that instead of "" or even “” we could see «» or „“.
Also the point here is not that the particular case. Each particular case
can be fixed or worked around. The point here is that changing the environment
affects processes spawned from the direct children, and it's usually not
even known what subprocesses there are let alone whether they are sensitive
to such change.

> [...]
> If you're telling me that you're setting the environment before the Qt 
> application to cope with its brokenness, I will ask why that application 
> hasn't been fixed in the 16 years since UTF-8 environments became a thing.

Because a lot of code in a state where it's better not touched, or there
are no resources to do that. 

Recoding Latin1 to UTF-8 sources is _not_ harmless in a world of strcat
and strcmp and fixed sized buffers.

"Becoming a thing" does not mean "everyone uses it".

> And we can provide a way to force Qt not to set the environment, for
> those weird cases where you musts deal with broken,

Calling functioning code "broken" just because it does not fit into your
world is starting to get on my nerves.

The code is *not* broken. You are just set to break it.

And if you go down that road, the consquence will not be that this code
will be changed, but rather that this code will never see a Qt 6 GUI.

> until the heat death of the Universe. And I will ask why everyone else must 
> pay a performance price for the sake of those old, broken applications that 
> even the maintainer isn't fixing anymore?

Which price exactly?

Anyway. Forget it. I am giving up here.

You have not yet answered 

  - why this decision was made
  - who did it
  - what the actual problem to solve was
  - why LC_*ALL* comes into play

and it doesn't look like you want to answer them.

I get the impression that this thread was not started as an RFC for an
open-ended discussion, but as a staged attempt to provide a figleaf for
a pre-determined decision.

Sad to see that in the official decision making process of the Qt Project.


More information about the Development mailing list