[Development] Backwards compatibiltiy break in Qt 5.5

Mon Jul 27 18:05:29 CEST 2015

On Monday 27 July 2015 01:25:18 NIkolai Marchenko wrote:
> Hi! We (Russian Qt community) have a situation we could not resolve with
> Thiago Macieira in bug tracker, so have to ask for Chief Maintainer's
> attention.
> The bug in question is there:
> https://bugreports.qt.io/browse/QTBUG-47316

Yes, I asked that they post here if there's a need to overrule me. I wanted 
Nikolai to get his story out first, which is why I waited a day before 
replying.

Now, let's start with the subject: it's a change in behaviour, properly 
documented in the changelog[*], in an area that did not work properly before. 
Previously, the debugging of QStrings would produce ambiguous output if the 
source string had any quote characters. More importantly, it could lose data, 
as the qDebug output was not 8-bit clean. Therefore, the functionality was 
never officially supported.

[* Important: the changelog was not posted for the 5.5 release]

The change makes the functionality now consistent across all platforms, 
through all qDebug backends. It also makes the output unambiguous, since a 
quote character is escaped to indicate it's still part of the string. It adds 
two more interesting features:

1) the string can be copied back into the source code and will compile to 
exactly the contents of the QString or QByteArray that produced it. No more 
need to escape backslashes, for example.

2) the output has no homographs, no control characters and no zero-width 
codepoints (including no combining characters), allowing for easier debugging. 
A homograph is a character whose glyph is identical to another character. The 
most widely publicised case is that of the Latin letter "a" and the Cyrillic 
"а", which was the attack vector for homograph sites like pаypal.com or 
pаypаl.com or раураl.com or other permutations of the letters. A control 
character could change the way the text gets logged, like for example the 
carriage return character causing overwrite of what had already been output in 
a terminal.

I'm not saying qDebug's output is a security issue or possible attack vector, 
but having no ambiguity and no homographs makes for easier debugging. It could 
save quite a few minutes of staring at a string that *looks* right but whose 
manipulation generates an apparently unexpected result.

That is, after all, the purpose of qDebug: debugging.

By the way, the output of QtTest has also changed in the same way.

> In version 5.5 Thiago Macieira introduced a compatibility break with older
> qt versions.
> This break has come in the form of completely changing the way qDebug()
> prints non-english characters to console.

Let's be clear: this is about non-US-ASCII characters and only if we're 
talking about a QString or QByteArray. If you're passing a character literal 
or char* directly, the feature does not kick in.

People who did not want the quotes around the string before Qt 5.4 would have 
already used qPrintable() or similar mechanisms. Anyone using the noquote() 
feature since 5.4 is also similarly not affected.

> In essense - his "fix" to qDebug() will make it so that all non-english
> output from QString variable to qDebug()
> will be passed as a sequence of unicode escaped characters along the lines
> of:
> 
> "\u043F\u0440\u043E\u0432\u0435\u0440\u043A\u0430
> 043A\u0438\u0440\u0438\u043B\u043B\u0438\u0446\u044B"
> 
> Previously, in _some_ cases, but not reliably cross platform it would
> actually appear in console as readable output in user's language.
> I have to admit that Thiago is right about it being non-reliable
> contradicts qt's goal. Still...

Right.

> What Thiago completely disregards as unimportant - qDebug()'s been around
> for a looooong time. Even during times of qt4, users,
> through the use of setCodec functions, achieved readable output from qDebug
> in the console in their native language.

Again, not properly and not everywhere. Moreover, the requirements to continue 
to make it work have changed along the way. In Qt 5.0, we introduced a message 
handler that took QString instead of a char*, so the previous solutions that 
relied in setCodecForLocale may have stopped working. (I actually don't know 
if it did, as we've never tested, which goes to prove my point of it being an 
unsupported feature)

Along the next few Qt 5.x releases, we introduced different output backends 
too, some of which took the data in UTF-8 and others didn't.

> Note that I say qt4. It's been around for almost a decade and there being
> no warning qDebug ever changing, it's become a standard
> on a lot of non-english installations, that actually allowed getting
> readable output, to use qDebug for tracing variables..
> After all, opening debugger to inspect every little thing is very
> counterproductive.

Agreed, even if poor practice to debug via qDebug. A debugger is always a 
superior solution. Systems where a debugger will not run are not systems worth 
developing for (fix your toolchain instead or stop using that vendor).

Note that "superior solution" does not imply "most convenient". Sometimes you 
trade one off for the other. That's called "compromise". You'll use debug 
logging when you're not certain what the issue is, so you want to generate a 
trace of execution for later inspection.

Often, that also means leaving the tracing on for production deployment.

> So here is where we are now, there are hundreds (probably thousands
> actually) codebases, where there is code like that:
> qDebug() << someLabel;
> // skip
> qDebug() << someVariableFromDatabase;
> // skip
> qDebug() << someTextFromNetworkService;
> 
> This is kinda normal and completely adhering to Kai Kohne's message he
> posted on youtube in qt-dev days video
> of using qDebug() to log program execution to console. Now, we arrive to
> Qt5.5. From here on now, instead of reading the trace of
> program execution everyone using the old code will be subjected to strings
> like
> 
> \u043F\u0440\u043E\u0432\u0435\u0440\u043A\u0430
> 043A\u0438\u0440\u0438\u043B\u043B\u0438\u0446\u044B

You forgot the "", but true.

> Note, that there are dozens to hundreds of such lines per application. And
> in SUCH a format,
> the trace posted above becomes completely useless for the purposes of
> debugging,
> because people, generaly, don't have unicode parser plugged into their
> brains.

I dispute "becomes completely useless for the purposes of debugging". 
Obviously, I added the feature because I think this makes the output even more 
suitable for debugging. My rationale was that the unambiguous, lossless output 
would allow for catching more conditions that the user was searching for.

What's more, when we're talking about debugging a QString, the objective is to 
understand what's in it. I completely agree that this is more inconvenient to 
read if you have no ASCII characters. In fact, this all started because QtTest 
would print QByteArray comparison failures with a full hex dump: instead, I 
made it print an escaped C string, then brought it to qDebug.

But as a Russian speaker, you must have run at least once where there was an 
"a" where an "а" should have been or vice-versa. The new output makes it 
evident.

In any case, we're going on the direction of more tool-based logging anyway 
(see syslog, journald, alog, slog, etc.) so the lack of a proper tooling is 
not an argument. You may say that proper tooling could highlight the 
homographs too, but this requires first having lossless output and we don't 
have that.

This does not solve the problem of ambiguity.

> Thiago argues that his way, user will always get exactly which type of A
> the symbol is.
> Our point - now user will first need to pass the output of qDebug to
> unicode converter to even understand what's he looking at.
> 
> Now, again, I do not dispute that qDebug in a way it was being used was
> correct or reliable across all platforms or allowed to check exactly which
> symbol was contained in a string. But BETTER way to fix that, in my
> opinion, and all of the Russian users who are already aware of the problem
> is to introduce a qDebug().escape() or something along those lines, in
> those RARE cases when instead of just tracing variables, user NEEDS to see
> unicode.

The problem with this solution is that you will not use it until it's too 
late. If you're logging strings, it stands to reason that you're not sure what 
the issue is. In turn, the issue may not be logged at all if the issue is in 
the part that the 8-bit-uncleanliness and ambiguity will lose.

Therefore, I am completely against the lossy and ambiguous format be the 
default. I do accept leaving an option for those who wish to lose data. It 
just has to be opt-in.

> Here we have a conflict of interest: "Do we fix qDebug() in a way it is
> SUPPOSED to work, breaking code for users, or do we make a different
> flavour
> of qDebug that actually works as we intended it to, cross platform and
> escapes everything and let people who need it switch to that?"

Your question is not valid. You're assuming that the way it's supposed to work 
is the way you want it to work. Obviously I dispute that. Your question is 
also very biased.

So I will not bother with it.

> Personally, from what I know of application development, breaking codebase
> is always bad. breaking a LARGE codebase, is horrible.

It was already broken due to depending on an unsupported and broken feature. 
qDebug has never been 8-bit clean. As I said multiple times in the bug report, 
I do not feel any qualms about a change in behaviour in an area that was 
broken.

> So, I am at a loss of why Thiago decided to go the first way. We've tried
> to convince him he underestimates the potential harm to customers of Qt,
> but he refuses to listen insisting that if ppl used qDebug() incorrect,
> they are not entitled to their code not breaking at any moment he seems
> fit to fix it. But the point is - how were we to know we were using
> qDebug() incorrectly, when nothing along those lines were EVER posted on
> qDebug()'s help page?

Because we do not document what the API does not do. We only document what it 
does.

Also, people asking questions in IRC were advised that qDebug was not 8-bit 
clean and should not rely on it, along with the warning that qDebug should not 
be used for formatted output. Of course, such answer usually came *after* 
someone could not see their non-US-ASCII characters in the first place. That 
supports my point that people would never voluntarily turn on the escaping 
until after they lost data.

I want to prevent that.

Google searches turn up a few more people having this very problem:
http://www.archivum.info/qt-interest@trolltech.com/2007-03/00659/Encoding-trouble-qDebug-console-WindowsXP-2000-cp1251-cp866-KOI8-R.html
http://www.qtcentre.org/archive/index.php/t-56384.html
http://osdir.com/ml/lib.qt.general/2008-04/msg00192.html
https://groups.google.com/forum/#!topic/qtturkiye/KRRiFDYvmLg

> People relied on the way it's worked before.

Read: people relied on broken behaviour.

> No one warned peope, about the change.

People were warned in the changelog.

> The change is not even backwards compatible to older qt versions.

For someone who relied on a bug, any bugfix is backwards-incompatible.

Any change is changing behaviour for someone. All of them rely on a judgement 
call on whether the new behaviour has more advantages than the breakage it may 
cause for people who may be relying on them.

> As Thiago refuses to admit there is a problem, we have to ask the
> authorities behind Qt to look into this.

I am the author of the change as well as the maintainer of the module the 
change went in. I can be overridden by the Chief Maintainer.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center