[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

André Pönitz apoenitz at t-online.de
Mon Nov 4 20:47:18 CET 2019

On Mon, Nov 04, 2019 at 10:55:03AM -0800, Thiago Macieira wrote:
> On Monday, 4 November 2019 10:29:16 PST André Pönitz wrote:
> > All but one do not let the UI user change the environment, i.e. the
> > environment is passed through the Qt UI process (so far). The one is
> > Qt Creator, but even there it is not possible to configure all child
> > processes, and would not be tolerable to tell users "When you create a
> > new run configuration remember to undo spurious environment changes done
> > by Qt".
> It's highly unlikely you're running Qt Creator in a non-UTF-8 environment in 
> the first place.


   > locale | grep -q '=C$' && echo oops

> KDE has not supported such locales for 15 years.

I haven't tried to run KDE in earnest for about the same time. 

> If we were in 2004-2006 when this was recent and other Unix environments like 
> Solaris and HP-UXi where non-UTF-8 could be still in use I could understand 
> the skepticism.
> > <rant>
> > There _are_ setups that _are_ set in stone, that are not connected
> > to anything and that don't give anything on updates, or do not even
> > have the possibility to be "fixed" or changed in any way.
> Why are you inserting Qt 6 into them, then?

Because data generation and data visualization are different tasks, that
can, and perhaps should, be done in different processes, and while data
visualization occasionally might need to react to user demand, data generation
might not.

> > Looks contrieved? [Check your hard disk before you answer.]
> I'll do a full search on Clear Linux to see if there's any software that 
> checks the return value of setlocale().
> > Potentially harmful behaviour should always be opt-in, not opt-out
> > (and never be non-configurable).
> I don't disagree on the statement. I just disagree on whether it's harmful. 
> *Not* calling qputenv could be harmful too.

As mentioned in the second example, even "clean ASCII" 7 bit input produces
different results under "C.UTF-8" and "C":

 echo x | LC_ALL=C.UTF-8 gcc -xc -
 echo x | LC_ALL=C  gcc -xc -

Given that most parsers in the world are ad-hoc, chances are high that some
are based on looking for certain quotes, but not for others.

And even if someone knows that the immediate child processes are ok with
C.UTF-8, their children, grand children, ... might not.


More information about the Development mailing list