[Development] RFC: Defaulting to or enforcing UTF-8 locales on Unix systems

Thiago Macieira thiago.macieira at intel.com
Mon Nov 18 13:14:18 CET 2019


On Monday, 18 November 2019 00:12:19 CET Giuseppe D'Angelo via Development 
wrote:
> I don't know either. Is it to make QtCore smaller? Wasn't the feature
> system ("Qt Lite") supposed to address that? Or is it to make it less of
> a "kitchen sink", and split it in smaller libraries? Could that mean
> having QTextCodec in its own library, and QXmlStreamReader in another
> (that depends on the former)?

The codecs we want to remove are just big tables of mapping old, legacy codecs 
to UTF-16. We can easily remove those.

After that, removal of QTextCodec itself is not a big gain.

> > Related to that is the discussion of whether UTF-8 is the only acceptable
> > locale on Unix systems. If we don't have QTextCodec, then we have to have
> > something fixed for QString::fromLocal8Bit and it would necessarily be
> > UTF-8. But even if we do have QTextCodec, that's still a reasonable
> > question: should assume it is UTF-8? And should we enforce it? Those were
> > the questions in my OP.
> 
> Should fromLocal8Bit be following the locale environment instead
> (LC_CTYPE, LC_MESSAGES or similar)?

That's what it does today. The question is whether we can assume those imply 
UTF-8, like we do when QT_LOCALE_IS_UTF8 is defined.

> > If QTextCodec is not in QtCore, then most likely you can't affect how
> > QtCore and almost all other Qt classes decode 8-bit data into QString,
> > including QTextStream.
> 
> See above -- it also means QTextStream goes in some I/O lib that
> contains or depends on the codecs lib.

Or we remove the ability in QTextStream to specify the codec, which is what 
the proposed change would do. I don't think we can move QTextStream out of 
QtCore.

> Why do we bother about "saving the world"? A misconfigured system is the
> user's mistake. They should be in charge of fixing it in order to
> address the problem.

That is an option and this is what the qFatal I mentioned would do.

> > For #2, the sub-questions of the OP apply:
> >   a) What should Qt 6 assume the locale to be, if no locale is set?
> >   b) In case a non-UTF-8 locale is set, what should we do?
> >   c) Should we propagate our decision to child processes?
> > 
> > My preferences were:
> >   a) C.UTF-8
> >   b) override it to force UTF-8 on the same locale
> >   c) yes
> 
> How about
> 
> a) either C / C.UTF-8, but warning the user; but I'd up the ante, and
> say: just assert/crash.
> 
> b) keep the choice. Silently changing it sounds like a bad idea; we
> should never override the user choices silently.

That means keeping QTextCodec and the ability to work with an arbitrary codec.

> c) no. We shouldn't "fix" subprocesses. They have the right to make
> their own independent decisions.

This is not about fixing the subprocess, but about ensuring that it can talk 
to the current process. And it's only necessary if in (b) we override, 
selecting UTF-8. If we don't override or if we forbid running with a non-UTF-8 
locale, then we don't need to set the environment.

> Or, on the other hand: what is the chance that a system comes without a
> locale set? What is more likely to conclude, that it's an accident or a
> deliberate setting? If it's an accident, why not being *very* verbose
> about it?

It's extremely unlikely that a Qt application, especially a Qt 6 one, will be 
run with no locale set. So if the locale isn't set to UTF-8, then it's 
explicit. The question is whether it was *intentional* to change the codec.

As I've argued time and again, changing the locale to English is standard 
practice in any tool parsing another tool's output. But did they mean to 
change the codec too?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products





More information about the Development mailing list