[Development] utf-8 BOM and parsers

Olivier Goffart olivier at woboq.com
Mon Apr 14 18:29:26 CEST 2014


On Monday 14 April 2014 07:14:44 Thiago Macieira wrote:
> Em seg 14 abr 2014, às 15:13:53, Frank Osterfeld escreveu:
> > On 14 Apr 2014, at 14:26, Simon Hausmann <simon.hausmann at digia.com> wrote:
> > > Since this affects not just one place but many (and for example we have
> > > many copies of the QML lexer around), I'd like to determine what the
> > > _correct_ fix for this issue is, because frankly speaking I don't know
> > > 
> > > :). However I have an interest in the same fix being applied to qtbase,
> > > 
> > > qtdeclarative, qtscript, qtcreator and other affected modules.
> > 
> > Even more critical, this behavioural change won’t only affect Qt modules,
> > but also a lot of customer code, which cannot be fixed by us. Which makes
> > me wonder if such a be a change between 5.2 and 5.3 is acceptable at all.
> > Was it intentional or an unintended side-effect? I can’t find any
> > discussion about the issue.
> 
> It was intentional as part of the UTF-8 codec rewrite.
> 
> > > 3) I noticed that QString::fromUtf8() differs from QTextCodec in this
> > > aspect. Is that intentional?
> > 
> > That inconsistency makes it even more confusing to me.
> 
> QTextCodec is stateful and allows you to choose, as one of the options,
> whether to ignore the BOM or not. QString::fromUtf8 is stateless.
> 
> Anyway, I don't want to change the behaviour back, but if the consensus is
> that it should be done, I'll prepare a patch and send to release.


What were the reason to change that behaviour?
Personally, I think it's safer to keep the 5.2 behaviour and avoid breaking 
user's code.

-- 
Olivier 

Woboq - Qt services and support - http://woboq.com - http://code.woboq.org



More information about the Development mailing list