[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Edward Welbourne edward.welbourne at qt.io
Mon Nov 4 10:40:00 CET 2019


Thiago:
>>>> My personal preference is:
>>>> a) C.UTF-8
>>>> b) override it to force UTF-8 on the same locale
>>>> c) yes

Lars:
>>> I agree with all three choices.

On Friday, 1 November 2019 12:29:19 PDT André Pönitz wrote:
>> a) and b) are fine with me, "c) yes" sounds like a potential problem.
>>
>> Most of the child process I usually call are not Qt based,

That shouldn't matter.  Qt<6-based things and non-Qt things are all the
same from the point of view of the contemplated change.

To what extent are these child programs started via a UI that lets the
user set environment variables (as I assume all IDEs do for most of the
commands they run) ?  Obviously, if some antique needs a special locale,
that's no problem if it's started via a UI that lets one configure its
environment, overriding what Qt might have set.

>> rather some random unrelated tools, in some cases even quite old
>> random unrelated tools.

I read antiquity as tending to assume C locale, so unharmed by C.UTF-8,
although some may be assuming an ISO Latin or similar legacy codec.
All the same, so antique as to not grok Unicode at all is pretty old !
You probably need to update it for security fixes, by now.

Thiago Macieira (1 November 2019 22:49)
> TBH, all the more reason for propagating the choice. Please remember
> that on any modern Linux or macOS or FreeBSD, they are already running
> with a UTF-8 locale. The most common scenario of our setting something
> is when LC_ALL=C was set in the environment, which will cause us to
> reset it to C.UTF-8.

Indeed, what program would have problems in C.UTF-8 yet have a
non-Unicode locale in which it works nicely ?
An example would help us to reason about this ...

>>>> ones, would not make the same choices. If we do not propagate, we
>>>> could end up with a child process (often helpers) that make
>>>> different choices as to what command-line arguments or pipes or
>>>> contents in files mean.

>> If we propagate we'll expose the child processes to locales they
>> might not expect, in circumstances where the user of the system
>> possibly intentionally chose a non-UTF8-locale to make exactly those
>> child processes happy.

> True, but that was done at the expense of running Qt in a largely
> unsupported and untested scenario. Setting the locale to C means we
> can't access any file with an 8bit file name; setting to Latin1 would
> allow that, but produce mojibake in GUI.

>> Effectively, going for "c) yes" deprives the user of a certain level
>> of freedom that is needed, "c) no" is less intrusive.
>>
>> "c) no" as default and a simple one-liner opt-in for applications
>> that want to engage in "strict parenting" might be an option, too.

> How about making the resetting opt-out, instead of opt-in?
> QT_NO_OVERRIDE_LC_CTYPE?

Possibly its value could be:
* all, 1, yes, true, .* - it applies to all child processes [*]; or
* a list of regexes for program names to which it applies, when started
  as child processes.

Or is that too hard to implement at all the places where we call exec()
and its equivalents ?

[*] I'm fairly sure the actual Unix programs yes and true don't care
about locale, so treating them meaning as .* would be harmless ...

	Eddy.


More information about the Development mailing list