[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

André Pönitz apoenitz at t-online.de
Sat Nov 2 12:53:10 CET 2019

On Fri, Nov 01, 2019 at 02:49:36PM -0700, Thiago Macieira wrote:
> On Friday, 1 November 2019 12:29:19 PDT André Pönitz wrote:
> > > > My personal preference is:
> > > > a) C.UTF-8
> > > > b) override it to force UTF-8 on the same locale
> > > > c) yes
> > > 
> > > I agree with all three choices.
> > 
> > a) and b) are fine with me, "c) yes" sounds like a potential problem.
> > 
> > Most of the child process I usually call are not Qt based, rather some
> > random unrelated tools, in some cases even quite old random unrelated
> > tools.
> TBH, all the more reason for propagating the choice. Please remember that on 
> any modern Linux or macOS or FreeBSD, they are already running with a UTF-8 
> locale.

With that argument we wouldn't even need to change the locale for the
actual Qt application.

I think we are currently discussing the rare case where the Qt application
is started with a non-UTF-8 locale, and the main question is whether this
was some kind of accident that the Qt application should correct for their
child processes or whether this was intentional.

As you said, any modern Linux or macOS or FreeBSD default to UTF-8, so
chances are high that any deviation from that is actually intentionally.

> The most common scenario of our setting something is when LC_ALL=C was 
> set in the environment, which will cause us to reset it to C.UTF-8.

I understand that, and even though I am not aware of an actual problem for
my personal uses I am a bit reluctant to expose unsuspecting processes
to a variable-lengths encoding they may not be aware of. At least there's
a potential for buffer overruns here.

Also, going from "C" to "C.UTF-8" might foil code checking for the 
string "C" explicitly in a child process.

> > > > ones, would not make the same choices. If we do not propagate, we could
> > > > end up with a child process (often helpers) that make different choices
> > > > as to what command-line arguments or pipes or contents in files mean.
> > 
> > If we propagate we'll expose the child processes to locales they might not
> > expect, in circumstances where the user of the system possibly intentionally
> > chose a non-UTF8-locale to make exactly those child processes happy.
> True, but that was done at the expense of running Qt in a largely unsupported 
> and untested scenario. Setting the locale to C means we can't access any file 
> with an 8bit file name; setting to Latin1 would allow that, but produce 
> mojibake in GUI.

Setting to "C" also "works" in practice when blobs are just read and written

> > Effectively, going for "c) yes" deprives the user of a certain level of
> > freedom that is needed, "c) no" is less intrusive.
> > 
> > "c) no" as default and a simple one-liner opt-in for applications that
> > want to engage in "strict parenting" might be an option, too.
> How about making the resetting opt-out, instead of opt-in? 

I was more thinking of a runtime option. Like 


Or do I miss something why this has to be a compile time choice?


More information about the Development mailing list