[Development] QString and related changes for Qt 6

Tue May 12 12:36:23 CEST 2020

> On 12 May 2020, at 11:34, André Pönitz <apoenitz at t-online.de> wrote:
> 
> On Tue, May 12, 2020 at 07:49:06AM +0000, Lars Knoll wrote:
>> I believe it’s important to leave the non Unicode world behind us [...]
> 
> Is that meant to be a convincing technical argument?

This is nothing technical per se.
> 
>>   * We have extensive support for legacy text encodings in Qt Core, that
>>   should not be there anymore in 2020
> 
> To clarify, since this kind of things is easily misread:
> 
>  "It should not be _in Qt Core_, but it should be somewhere else _in Qt_."
> 
> Getting easy access to encodings is a valuable feature of Qt.

A separate library that uses ICU behind the scenes is something I agree with. QTextCodec in it’s current form not so much.
> 
>>   * We offer options to generate HTML or XML in legacy encodings, even
>>   though the standard clearly says that those are deprecated
>> 
>>   * to/fromLocal8Bit() should be equivalent to to/fromUtf8() on all but
>>   Windows (where we’re still a few years away from fully getting rid of
>>   this)
>> 
>>   * source code encoding is undefined
>>   Cleaning this up has progressed quite a bit, and a lot of changes in
>>   various classes have been merged. There’s a large set of changes
>>   currently being reviewed the remove QTextCodec as a dependency in Qt
>>   (it’ll get moved to libQt5Compat), and introduce a new QStringConverter
>>   class, that can handle transcoding between Unicode encodings, Latin1
>>   and the system locale. For all systems except Windows, we make the
>>   additional assumption that the system locale is UTF-8 (see also my
>>   other mail about UTF-8 as System locale on Windows).
> 
> libQt5Compat is something that's likely to go away in Qt 7. I don't see the
> general need for text codecs going away. So it would make more sense to have
> them in a module of their own from the beginning.

See above, and the new QStringEncoder/Decoder can support additional encodings (though that’s not yet implemented).
> 
>>   A next step is to change the build system, so that it (by default)
>>   assumes that source code is encoded in UTF-8. We are lady do set
>>   compiler flags to ensure this when building Qt itself, but are not
>>   doing this yet for user code.
> 
> Which makes sense, because it's not up to a library to dictate how user
> code has to look like.

Funny, how most other programming languages actually ‘dictate’ that. gcc and clang have both switched to making this the default already for quite some time (even if your system locale happens to not be utf8). I’ve not seen complaints about this anywhere.
> 
>>   But gcc and clang do already treat all source code as UTF-8 by default
>>   (and I believe ICC does the same at least on platforms other than
>>   Windows). MSVC will require a /utf-8 flag to enable this, something
>>   that I want to add to the default config for both qmake and cmake when
>>   compiling a Qt app. Without it, MSVC will still assume the source code
>>   is encoded in the current ANSI code page and u”…” or u8”…” will result
>>   in garbage. Worse it’ll lead to non portable code, that might compile
>>   correctly on one developer machine and create garbage on the next one
>>   (as it uses a different locale).
> 
>>   Changing this also for our users will make source code written for Qt
>>   more portable and bring Qt on par with most other programming languages
>>   in the world that already mandate utf8 as the source encoding (JS,
>>   Swift, Java, etc).
> 
> "Bringing on par" by cutting functionality that is.
> 
> To me it is unclear how relevant citing other languages here is. If anything
> at all, Standard C++ would be relevant, which does *not* mandate UTF-8.

We are talking what the default is. If someone really wants a different encoding, they can still do that. And the default is already utf8 on all but windows (where it’s the current ansi code page, which means anything but ascii is not well defined at all).
> 
>>   [...]
> 
>>   Comments are welcome, [...]
> 
> I buy a "codecs are too big for Qt Core, they should be separate" argument
> (that was not made here unless I overlooked it) and I buy the "there should
> not be multiple overloads for the mass of string-taking functions in the API"
> argument. I'd even buy a "we don't have resources to even keep it around".
> 
> I don't understand the motivation for the "legacy", "believe", "important to
> leave behind" line of reasoning.

Leaving things behind simplifies our lives and in the longer term also our users life. And yes, non unicode encodings are legacy in todays world. They need to disappear, and most people are working towards that goal. We can and should do our part.

Lars