[Development] utf-8 BOM and parsers

Rutledge Shawn Shawn.Rutledge at digia.com
Wed Apr 16 13:44:15 CEST 2014


On 14 Apr 2014, at 2:26 PM, Simon Hausmann wrote:

> We have various parsers in Qt that parse "source code" and do things with it, 
> such as the QML parser…

I was just baffled by this issue this morning for a couple of hours:  tried to "port" some QML code that was working fine with Qt 5.2.1 to 5.3.0 and got a nonsense error message that it expected a numeric value at row 1 col 1 in the file.  I didn't suspect the BOM until I showed the message to someone who happened to have seen this happen before.  So I can say that as of this moment we definitely are not compliant with Postel's Law  (http://en.wikipedia.org/wiki/Robustness_principle): we should ignore the BOM rather than being surprised by it, but also not require it.  I suppose others have a better idea where to make this change, but if the lower layers insist on keeping it, then it will need to mean the parser should ignore it (the first part of Postel's Law).  This needs to be fixed before we ship, otherwise it will be a show-stopper for someone if they can't figure out what this impertinent error message really means.  I never intentionally put that BOM there, so it must be that some old version of Creator or some other editor did it.  I do agree with the principle that BOMs should be avoided, because UTF-8 is a good default assumption about the code page if it's otherwise unknown.

Are the parsers still generated by qlalr?  Then maybe the fix could go there.

It might also be a good idea for Creator to strip the BOM, or at least warn about it, if it's really an inadvisable thing to ever want in a source file (second part of Postel's law).




More information about the Development mailing list