[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

André Pönitz apoenitz at t-online.de
Mon Nov 4 19:29:16 CET 2019


On Mon, Nov 04, 2019 at 09:40:00AM +0000, Edward Welbourne wrote:
> On Friday, 1 November 2019 12:29:19 PDT André Pönitz wrote:
> >> a) and b) are fine with me, "c) yes" sounds like a potential problem.
> >>
> >> Most of the child process I usually call are not Qt based,
> 
> That shouldn't matter.  Qt<6-based things and non-Qt things are all the
> same from the point of view of the contemplated change.
> 
> To what extent are these child programs started via a UI that lets the
> user set environment variables (as I assume all IDEs do for most of the
> commands they run) ?

All but one do not let the UI user change the environment, i.e. the
environment is passed through the Qt UI process (so far). The one is
Qt Creator, but even there it is not possible to configure all child
processes, and would not be tolerable to tell users "When you create a
new run configuration remember to undo spurious environment changes done
by Qt".

> Obviously, if some antique needs a special locale, that's no problem
> if it's started via a UI that lets one configure its environment,
> overriding what Qt might have set.

Even _if_ that UI would let the user configure the environment,
that's not an excuse.

> >> rather some random unrelated tools, in some cases even quite old
> >> random unrelated tools.
> 
> I read antiquity as tending to assume C locale, so unharmed by C.UTF-8,
> although some may be assuming an ISO Latin or similar legacy codec.
> All the same, so antique as to not grok Unicode at all is pretty old !
> You probably need to update it for security fixes, by now.

"Security reason, because it is old" must be Godwin's Law in 
"Always Online" times.

<rant>
There _are_ setups that _are_ set in stone, that are not connected
to anything and that don't give anything on updates, or do not even
have the possibility to be "fixed" or changed in any way.

If Qt development does not want to care for these cases _even as
child processes_ that's fine in principle (even with me), but then
it would help to clearly communicate that fact to prevent accidents
in the selection of toolkits.
</rant>

> Thiago Macieira (1 November 2019 22:49)
> > TBH, all the more reason for propagating the choice. Please remember
> > that on any modern Linux or macOS or FreeBSD, they are already running
> > with a UTF-8 locale. The most common scenario of our setting something
> > is when LC_ALL=C was set in the environment, which will cause us to
> > reset it to C.UTF-8.
> 
> Indeed, what program would have problems in C.UTF-8 yet have a
> non-Unicode locale in which it works nicely ?
> An example would help us to reason about this ...

The following works on all my setups (and, btw, with LC_ALL="C"
which I do _not_ use) and crashes with LC_ALL="C.UTF-8":

    #include <locale.h>
    #include <string.h>
    #include <stdlib.h>

    int main()
    {   
        if (strcmp((setlocale(LC_COLLATE, "")), "C") != 0)
            abort();
    }

Looks contrieved? [Check your hard disk before you answer.]

Shotgun-changing environment for child processes is _not_ harmless.

> >>>> ones, would not make the same choices. If we do not propagate, we
> >>>> could end up with a child process (often helpers) that make
> >>>> different choices as to what command-line arguments or pipes or
> >>>> contents in files mean.
> 
> >> If we propagate we'll expose the child processes to locales they
> >> might not expect, in circumstances where the user of the system
> >> possibly intentionally chose a non-UTF8-locale to make exactly those
> >> child processes happy.
> 
> > True, but that was done at the expense of running Qt in a largely
> > unsupported and untested scenario. Setting the locale to C means we
> > can't access any file with an 8bit file name; setting to Latin1 would
> > allow that, but produce mojibake in GUI.
> 
> >> Effectively, going for "c) yes" deprives the user of a certain level
> >> of freedom that is needed, "c) no" is less intrusive.
> >>
> >> "c) no" as default and a simple one-liner opt-in for applications
> >> that want to engage in "strict parenting" might be an option, too.
> 
> > How about making the resetting opt-out, instead of opt-in?
> > QT_NO_OVERRIDE_LC_CTYPE?
> 
> Possibly its value could be:
> * all, 1, yes, true, .* - it applies to all child processes [*]; or
> * a list of regexes for program names to which it applies, when started
>   as child processes.

The syntax doesn't really matter, but the direction "opt-out" is wrong.

Potentially harmful behaviour should always be opt-in, not opt-out
(and never be non-configurable).

Andre'


More information about the Development mailing list