[Development] utf-8 BOM and parsers

Thiago Macieira thiago.macieira at intel.com
Mon Apr 14 16:14:44 CEST 2014


Em seg 14 abr 2014, às 15:13:53, Frank Osterfeld escreveu:
> On 14 Apr 2014, at 14:26, Simon Hausmann <simon.hausmann at digia.com> wrote:
> > Since this affects not just one place but many (and for example we have
> > many copies of the QML lexer around), I'd like to determine what the
> > _correct_ fix for this issue is, because frankly speaking I don't know
> > :). However I have an interest in the same fix being applied to qtbase,
> > qtdeclarative, qtscript, qtcreator and other affected modules.
> 
> Even more critical, this behavioural change won’t only affect Qt modules,
> but also a lot of customer code, which cannot be fixed by us. Which makes
> me wonder if such a be a change between 5.2 and 5.3 is acceptable at all.
> Was it intentional or an unintended side-effect? I can’t find any
> discussion about the issue.

It was intentional as part of the UTF-8 codec rewrite.

> > 3) I noticed that QString::fromUtf8() differs from QTextCodec in this
> > aspect. Is that intentional?
> 
> That inconsistency makes it even more confusing to me.

QTextCodec is stateful and allows you to choose, as one of the options, 
whether to ignore the BOM or not. QString::fromUtf8 is stateless.

Anyway, I don't want to change the behaviour back, but if the consensus is 
that it should be done, I'll prepare a patch and send to release.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Development mailing list