[Qt-interest] How to parse a paragraph twice with an XML stream parser?

Lingfa Yang lingfa at brandeis.edu
Tue May 12 16:34:38 CEST 2009


Thank you for your reply. Your scheme, cache content, is feasible.

I can use QXmlStreamReader grab the paragraph content and save as a 
QString helped by QXmlStreamWriter. Then, create a new QXmlStreamReader 
to read the string.

Problem is efficiency. 1) When read by QXmlStreamReader, it does not 
read raw data. For example, if I get '<' or '>' characters, they are 
escaped from &lt; and &gt; in XML file, and writing content need escape 
again, which sounds extra cost. 2) Reading by QXmlStreamReader, under 
scene, is a tokenizing process. Every time, I create a parser to read, 
it does tokenizing again. Is it possible to cache tokens and read cached 
tokens?

Lingfa

Paul Miller wrote:
> Lingfa Yang wrote:
>> Hi XML experts or XML application developers,
>>
>> I am using QXmlStreamReader parsing document.xml. The file can be 
>> huge, so I prefer this parser better than DOM.
>>
>> The document.xml contains many paragraphs. With each paragraph I have 
>> to parse it twice. I wish QXmlStreamReader can remember start element 
>> of each p (Paragraph) tag. When finish the first time reading, reset 
>> to the p tag again and read second time.  It seems QXmlStreamReader 
>> cannot do that.
>
> The stream reader simply can't do that - remember a "stream" could be 
> a sequence of characters sent over a wire (like the Internet) and it's 
> just cached anywhere.
>
> What you should probably do is cache the contents of the element 
> yourself. When you encounter one of these that you need to "read" 
> twice, store the contents and related attributes in your own cache, 
> then when you need to "read" it again, just read from your copy.
>




More information about the Qt-interest-old mailing list