[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Thiago Macieira thiago.macieira at intel.com
Sat Nov 2 16:24:19 CET 2019

On Saturday, 2 November 2019 04:53:10 PDT André Pönitz wrote:
> > TBH, all the more reason for propagating the choice. Please remember that
> > on any modern Linux or macOS or FreeBSD, they are already running with a
> > UTF-8 locale.
> With that argument we wouldn't even need to change the locale for the
> actual Qt application.
> I think we are currently discussing the rare case where the Qt application
> is started with a non-UTF-8 locale, and the main question is whether this
> was some kind of accident that the Qt application should correct for their
> child processes or whether this was intentional.

Right. And the conclusion so far is that it is a mistake.

> As you said, any modern Linux or macOS or FreeBSD default to UTF-8, so
> chances are high that any deviation from that is actually intentionally.

Except for the LC_ALL=C case for overriding the user's locale so that one can 
get messages and formatting in machine-parseable format. The normal case and 
this one probably account for over 99% of all scenarios.

> > The most common scenario of our setting something is when LC_ALL=C was
> > set in the environment, which will cause us to reset it to C.UTF-8.
> I understand that, and even though I am not aware of an actual problem for
> my personal uses I am a bit reluctant to expose unsuspecting processes
> to a variable-lengths encoding they may not be aware of. At least there's
> a potential for buffer overruns here.

Is your shell configured for German or for English? Try setting your locale to 
German and then see how long it will take for you to have to override when 
posting a question or an answer.

$ ls á
ls: cannot access 'á': Arquivo ou diretório inexistente

$ gcc -xc /dev/null
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: /usr/
lib64/gcc/x86_64-suse-linux/9/../../../../lib64/crt1.o: na função "_start":
referência não definida para "main"
collect2: error: ld returned 1 exit status

$ gcc -xc /dev/null -lmain                  
/usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: não 
foi possível localizar -lmain
collect2: error: ld returned 1 exit status

> Also, going from "C" to "C.UTF-8" might foil code checking for the
> string "C" explicitly in a child process.

True, though that's extremely unlikely anyone is doing that.

> > True, but that was done at the expense of running Qt in a largely
> > unsupported and untested scenario. Setting the locale to C means we can't
> > access any file with an 8bit file name; setting to Latin1 would allow
> > that, but produce mojibake in GUI.
> Setting to "C" also "works" in practice when blobs are just read and written
> unmodified.

Except when such a blob's file name contains a character outside of the US-
ASCII subset.

$ ./lconvert á.qm  
Cannot open á.qm: No such file or directory
$ LC_ALL=C ./lconvert á.qm
Cannot open ??.qm: No such file or directory

Was this just the output or did it try to open this actual file?
$ strace -E LC_ALL=C ./lconvert á.qm |& grep -F .qm       
execve("./lconvert", ["./lconvert", "\303\241.qm"], 0x55c2ef3cc7a0 /* 118 vars 
*/) = 0
openat(AT_FDCWD, "??.qm", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Arquivo ou 
diretório inexistente)
write(2, "Cannot open ??.qm: No such file "..., 45Cannot open ??.qm: No such 
file or directory

> > How about making the resetting opt-out, instead of opt-in?
> I was more thinking of a runtime option. Like
>   QCoreApplication::setPropagateOurChoices(true)

I think a runtime option like that belongs in QProcessEnvironment.

> Or do I miss something why this has to be a compile time choice?

Yes: whether QString::fromLocal8Bit has to support anything besides UTF-8.

Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products

More information about the Development mailing list