[Qt-interest] Why QXmlStreamReader reports XML content in a broken way?

Lingfa Yang lingfa at brandeis.edu
Fri Jan 2 17:26:53 CET 2009


Thank you for your answer. My question is about QXmlStreamReader, not 
<qxmlsimplereader.html>QXmlSimpleReader.
QXmlStreamReader is a faster and more convenient replacement of 
<qxmlsimplereader.html>QXmlSimpleReader, which was introduced in Qt 4.3.

I, meanwhile, checked another parser: Xerces 
(http://en.wikipedia.org/wiki/Xerces) where element content does not 
break down by "entities"  - this raises my curiosity why 
QXmlStreamReader does. I am confused this is a defect or betterment?

Thanks,
Lingfa

Ben Bridgwater wrote:
> I think it's because "escaped characters" are really considered in the 
> XML specification as "entity references". The entities "quot", "apos" 
> etc are predefined in the specification, but you can also define your 
> own using declarations like <!ENTITY myentity "Replacement text">.
>
> Per the QXmlSimpleReader documentation the behavior you want should be 
> the default, but you could try setting it explicity via:
>
> QXmlSimpleReader::setFeature("http://trolltech.com/xml/features/report-start-end-entity", 
> false)
>
> Ben
>
> Lingfa Yang wrote:
>   
>> QXmlStreamReader users:
>>
>> A normal XML element supposed to be reported as three Token: 
>> StartElement, Characters, and EndElement.
>> But in this element:
>> <p>He said: &quot;I&apos;ll come again.&quot;</p>
>> the content is reported four times:
>> 1: "He said: "
>> 2: ""I"
>> 3: "'ll come again."
>> 4: """
>>
>> Does anyone know why QXmlStreamReader reports four times, instead of 
>> one: “He said: "I'll come again."" ?
>> Does this design benefit, or make escaped characters detected easier?
>>
>> Thanks,
>> Lingfa
>>     
> _______________________________________________
> Qt-interest mailing list
> Qt-interest at trolltech.com
> http://lists.trolltech.com/mailman/listinfo/qt-interest
>   




More information about the Qt-interest-old mailing list