[Interest] make qjsondocument recognize utf8 as utf8

Thiago Macieira thiago.macieira at intel.com
Thu Dec 31 22:57:12 CET 2015


On Thursday 31 December 2015 19:29:11 Thiago Macieira wrote:
> > The offset: 494
> > 
> > On Ubuntu 15.10, with the commercial 5.5.1, the output is Valid.
> > 
> > > For that matter, in the file that produces the error, is it using CRLF
> > > line- endings? The one from your email does.
> > 
> > It doesn't matter: I ran dos2unix on it, but it gives the same result.
> 
> With the same offset?

With CRLF line endings, offset 494 is nowhere near an UTF-8 sequence (it's the 
first 'y' in "Styczny normalny pędzel").

With LF lineendings, offset 494 is one byte before that 'ę'. I've checked and 
QJsonDocument's parsing does NOT have an off-by-one error reporting of UTF-8 
invalid sequences.

$ echo -e "[\"\x80\"]" | ./jsonvalidator /dev/stdin
Invalid "invalid UTF8 string" 2

There are no changes to either qjsonparser.cpp or qutfcodec_p.h since v5.5.1 
that could account for this bug being fixed.

The only two things that I can think of to explain this problem are that 
either CentOS packages applied a patch that broke the UTF-8 parsing or that 
their compiler is generating bad code.

Sorry.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Interest mailing list