[Development] FW: Backwards compatibiltiy break in Qt 5.5

Mon Jul 27 18:54:49 CEST 2015

On 27/07/15 18:13, "development-bounces+lars.knoll=theqtcompany.com at qt-project.org on behalf of Thiago Macieira" <development-bounces+lars.knoll=theqtcompany.com at qt-project.org on behalf of thiago.macieira at intel.com> wrote:

>On Monday 27 July 2015 08:44:30 Knoll Lars wrote:
>> I can understand the issues that non latin speakers are facing with this
>> well. I’ve often used qDebug() to debug non latin use cases, and in 90% of
>> the cases I just want to know what the string reads. The unicode content
>> of it is interesting to me in only 10% of the cases. For most users not
>> debugging Qt’s internals I would assume that ratio to be more like 99 to 1.
>
>As I said in my email, the problem is that you will not turn this on for those 
>10% of the cases until you've already lost data, either visibly or really. I'm 
>counting ambiguity and homographs as apparent data loss because it will send 
>you down the wrong debugging path.

That's a wrong argument. This happens very rarely in real world use cases. And you don't loose data because of debug output. In most cases, you want to quickly see whether the string says bøh or bæh, and I loose a lot of time as a developer if I don't see that at a glance.
>
>> Thiago, which backends can’t handle utf8 or utf16 output these days?
>
>Android log, slog2, syslog and the regular stderr output if the system locale 
>isn't UTF-8. 

Well, Android systems are always utf8 afaik. And all modern linux systems are utf8 as well.

>In particular, on Windows the system locale is never UTF-8 unless 
>you're using Vietnamese Windows. On the other hand, the OS X system locale is 
>always UTF-8 unless you've messed up, including with setCodecForLocale.

On Windows we use OutputDebugString by default, which takes utf16. So Mac and Windows are ok as well.

That leaves QNX/slog2, and if the system charset there is not utf8, I'm happy if you escape all chars that it can't show on that platform (and in this case QNX should fix their OS).

But we should penalize everybody because of one platform that very few people use.
>
>Or, if I list those that always do: WIndows OutputDebugString and journald.
>
>None of those are binary-safe, which means they will not work with a QString 
>containing a NUL character.

Yes, that's why I said that we should encode non printable characters.
>
>> One idea could be a slightly modified approach that is more compatible:
>> 
>> * We always encode QStrings as utf8 (maybe utf16 for the windows console)
>> when sending them to qDebug()
>
>That would change behaviour for people with non-UTF-8 systems, as they'd now 
>see mojibake.

Well, I think there are no non utf8 systems left apart from Windows, where this isn't a problem (since we use OutputDebugString).
>
>> * We quote every character that is not printable (ie. !QChar::isPrint())
>
>This is what I've done, except that I used ctype.h's isprint(), under the "C" 
>locale. I considered using QChar::isPrint here, but it would be very expensive 
>and it would not solve the homograph and ambiguity problem anyway. Better to 
>just do nothing than use QChar::isPrint.

QChar::isPrint() is not that expensive, and this is about debug output. Your code escaping all these chars is a more expensive if it gets hit. 

And I don't see how homographs are a problem for debug output. Unless you're debugging certain rather special cases in which case you should be able to turn full escaping on. I would guess that 98% of our users are using qDebug differently than you are.
>
>> * We add a flag that would give fully quoted strings so you can get a
>> quoted version if required (e.g. qDebug() << quoteStrings)
>
>I'm ok with a flag, as long as it's the default.

Well, having to turn this on on every qDebug() line is a major issue for large existing code bases. You can say that this was never really supported by qDebug, but in practice, it worked on Windows, any modern Linux, Mac OS X (and I'd assume also iOS and Android). 

So people were using it. Suddenly breaking it is a major issue for anybody developing programs in non ascii (esp non latin) languages and trying to debug these. Just think of the use case of trying to figure out what a string contains at a certain point in your program. If the output is human readable it's immediately obvious. If I just get escaped data, I need to spend lots of time figuring it out. 

So IMO the default behaviour should be to show unicode strings in a form that is readable but at the same time doesn't hide things like zero width spaces (ie. It should escape non printable chars).

Cheers,
Lars