[Development] HEADS-UP: QStringLiteral
Thiago Macieira
thiago.macieira at intel.com
Wed Aug 28 07:21:06 CEST 2019
On Tuesday, 27 August 2019 16:57:55 PDT Kevin Kofler wrote:
> If you do not explicitly add ".UTF-8", glibc always gives you the obsolete
> legacy locale with the locale-specific pre-Unicode character set. This is
> intentional for backwards compatibility. So you should never use a locale
> without a ".UTF-8" suffix, unless, like Thiago, you want to deliberately
> test what happens in a legacy non-UTF-8 locale.
>
> The locales are interpreted by glibc. Anything that assumes that a given
> locale uses a character set different from what glibc actually uses for that
> locale is broken. (But it looks like GCC doesn't assume anything about the
> locale and just always uses UTF-8 to begin with, contrary to what the
> documentation claims.)
Indeed. The charset can be obtained with the nl_langinfo(3) function from the
C library. Since there's no tool to print it for us, we use Python:
$ cat langinfo.py
import locale
print(locale.nl_langinfo(locale.CODESET))
$ python3 langinfo.py
UTF-8
$ LC_ALL=C python3 langinfo.py
ANSI_X3.4-1968
$ LC_ALL=pt_BR python3 langinfo.py
ISO-8859-1
$ LC_ALL=fr_FR at euro python3 langinfo.py
ISO-8859-15
$ LC_ALL=el_GR python3 langinfo.py
ISO-8859-7
$ LC_ALL=zh_CN python3 langinfo.py
GB2312
$ LC_ALL=ja_JP python3 langinfo.py
EUC-JP
I'm *so* glad I didn't remember three of the above and hadn't had to think of
them for 15 years. (I thought Japanese on Unix used Shift-JIS and Russian used
KOI8-R)
Anyway, doing a memory wipe. Aside from ISO-8859-1, I don't want to think of
any of the others for another 15 years.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
More information about the Development
mailing list