[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Thiago Macieira thiago.macieira at intel.com
Mon Nov 4 23:27:23 CET 2019


On Monday, 4 November 2019 10:55:03 PST Thiago Macieira wrote:
> I'll do a full search on Clear Linux to see if there's any software that
> checks the return value of setlocale().

All "setlocale" calls.

First, the calls that to strcmp: I found comparisons in gnulib and 
replacements for setlocale, which don't count (they're replacement for old 
systems Qt no longer [has never?] runs on). That left a couple of examples of 
exactly what you predicted:

glfw-3.3/src/x11_init.c:    if (strcmp(setlocale(LC_CTYPE, NULL), "C") == 0)
https://github.com/glfw/glfw/blob/master/src/x11_init.c#L934-L942
hack around C not supporting wide-char, which wouldn't be needed if we set the 
environment

firefox-60.1.0/xpcom/build/XPCOMInit.cpp:  if (strcmp(setlocale(LC_ALL, 
nullptr), "C") == 0) {
https://searchfox.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#337
the next line does setlocale(LC_ALL, "")

wxWidgets-3.1.2/src/common/intl.cpp:        wxASSERT_MSG( 
strcmp(setlocale(LC_ALL, NULL), "C") == 0,
https://github.com/wxWidgets/wxWidgets/blob/master/src/common/intl.cpp#L1694
Appears to be Windows-specific.

The assignments are much more numerous (1700 of them in my listing). A lot of 
them are of the form:
  old_locale = setlocale(LC_xxx, NULL);
which I assume is later followed up by a setlocale(LC_xxx, old_locale). These 
cases are not relevant to us.

https://github.com/GNUAspell/aspell/blob/master/common/config.cpp#L549-L561
Needs to find the locale to know what language to apply spelling for and also 
how to decode the text. UTF-8 is supported.

http://git.savannah.gnu.org/cgit/bash.git/tree/locale.c
Aside from the check *for* UTF-8 in LC_CTYPE, the assignments are only 
checking for null pointers.

http://git.savannah.gnu.org/cgit/bison.git/tree/src/getargs.c#n446
http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/system.h
Not relevant for us.

https://github.com/BOINC/boinc/blob/master/zip/zip/zip.c#L2214
Null check only, and checks for UTF-8

https://github.com/BOINC/boinc/blob/master/zip/unzip/unzip.c#L773
Not relevant, in #else for nl_langinfo

https://github.com/microsoft/cpprestsdk/blob/master/Release/src/utilities/
asyncrt_utils.cpp
Win32 only

https://github.com/apple/cups/blob/master/cups/language.c
Handles UTF-8 just fine.

https://github.com/apple/cups/blob/master/cups/langprintf.c
Forces .UTF-8.

https://github.com/doxygen/doxygen/blob/master/qtools/qtextcodec.cpp#L508-L529
Trying to guess what QTextCodec to use for ru_RU.

https://git.enlightenment.org/core/efl.git/tree/src/modules/ecore_imf/xim/
ecore_imf_xim.c#n832
Null check only. The rest of EFL is save/restore.

http://git.savannah.gnu.org/cgit/emacs.git/tree/src/sysdep.c#n4049
Null check only.

http://git.savannah.gnu.org/cgit/emacs.git/tree/src/sysdep.c#n4049
COULD mistake, as it does strcmp(locale, "C") then locale = "en"

https://github.com/GNOME/evince/blob/mainline/cut-n-paste/synctex/
synctex_parser.c#L4384-L4399
Save/restore.

https://github.com/GNOME/evolution-data-server/blob/mainline/src/camel/camel-iconv.c#L218
Does compare to "C", but not a problem since the failing case uses nl_langinfo

https://github.com/GNOME/evolution-data-server/blob/mainline/src/addressbook/
libedata-book/e-book-sqlite.c#L2891
Doesn't seem to be a problem.

https://github.com/GNOME/evolution/blob/mainline/src/e-util/e-xml-utils.c#L66
Just getting defaults.

https://github.com/fish-shell/fish-shell/blob/3.0.2/src/env.cpp#L373-L396
Comparing old to new. And no longer present in master.

https://github.com/fltk/fltk/blob/master/src/
Fl_Native_File_Chooser_GTK.cxx#L445-L458
Save/restore, not thread-safe.

https://github.com/zenotech/fox-toolkit/blob/master/src/FXTranslator.cpp#L84
Commented out.

http://git.savannah.gnu.org/cgit/gawk.git/tree/support/dfa.c#n988
Not a problem, just checking if the locale is ASCII-compatible.

binutils-gdb/blob/master/readline/readline/nls.c
Seems fine too.

https://github.com/geany/geany/blob/master/src/libmain.c#L980-L987
Only used in debug output

https://github.com/fangq/gftp/blob/master/lib/protocols.c#L382-L395
Null-pointer check & logging

https://github.com/GNOME/glib/blob/mainline/glib/guniprop.c#L724
Safe

https://github.com/GNOME/glib/blob/mainline/glib/gtranslit.c#L293
Seems to be fine

https://github.com/GNOME/glib/blob/mainline/glib/gdate.c#L1057-L1065
Checking cached results

I'm stopping here.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setlocale-grep.zst
Type: application/zstd
Size: 101220 bytes
Desc: not available
URL: <http://lists.qt-project.org/pipermail/development/attachments/20191104/7daaad62/attachment-0001.bin>


More information about the Development mailing list