[Qt-interest] QXmlStreamReader: Usage of readNextStartElement() [SOLVED]?

Wed Dec 22 11:14:07 CET 2010

On Wednesday 22 December 2010 12:01:08 Oliver.Knoll at comit.ch wrote:
> Constantin, David,
> 
> Thanks for your help!
> 
> On 2010-12-21 Constantin Constantin Makshin wrote:
> 
> > The general idea is:
> > If you use readNextStartElement() and want to get to the next element
> > on the same level, then before calling this function again use
> > readElementText() if the current element may have text and
> > skipCurrentElement() otherwise.
> 
> Yes, that makes sense, and that is also what I kind of assumed, but had trouble mapping to the description in the Qt docs. Especially the part which sais "When the parser has reached the end element, the current element becomes the parent element." gave me the impression that somehow the parser would resume, until the end element of the /first/ (outermost) element would be hit (the "first" element being the first current element when readNextStartElement() was first called... uh...)
> 
> Example (which is wrong, but just to illustrate my understanding of the Qt docs):
> 
> <current>
>   <- parsing is here
>   <foo>
>     <bar />
>   </foo>
>   <a>...</a>
>   <b />
> </current>
> 
> Assume the <current> element would be the "first current element", and the parser would be at the position marked with "<-". Now we would do:
> 
> - readNextStartElement(): returns <foo>, "current element" becomes <foo> (previously: <current>)
> - readNextStartElement(): returns <bar>, "current element" becomes <bar />
> - readNextStartElement (): Since we just hit the end element of <bar />, the "current element" becomes
>   the parent element again, which is <foo>...
> - ... and since we just hit </foo> end element, the "current element" becomes the parent element
>   again, which is <parent>, and hence <a> would be returned
> - readNextElement(): same logic as before, </a> end element is hit, parent <current> becomes
>   again the "current element", so <b /> is returned
> - ...
> - Until finally we hit </current> end element, which is the end element of the /first/ (or outermost)
>   element and hence false would be finally returned
I agree that one can assume such behavior after looking at the function's name. But at the same time it's not very useful because that would ruin the XML file structure, making your example looking the same as this:
<current />
<foo />
<bar />
<a />
<b />

The problem is that in your scenario the function that calls readNextStartElement() can't know that "<bar>" is a child element of "<foo>" simply because it won't be able to find "</foo>" (as readNextStartElement() would just silently skip it).
The current behavior is better and Qt "Stream Bookmarks" example shows why — you can separate XML reading function into smaller pieces for various elements. In your example it might look like (let's assume the reading function is called "readXml"):
- after creating QXmlStreamReader object you find the top element ("<current>") and start parsing the file by calling readNextStartElement();
- when you see "<foo>" element, you call your parseFoo() function which starts its own readNextStartElement() loop to parse "<foo>"-s child elements;
- when readNextStartElement() in parseFoo() returns "false", parseFoo() knows it found "</foo> and gives control back to the readXml() function;
- readXml() can safely continue its parsing loop because it can be sure parseFoo() handled all "<foo>" children and the current element is "<current>" again.
The idea is that when you want to parse an element's child elements, you just start another readNextStartElement() loop and when this inner loop ends, you're back in the element you were parsing.

> Anyway, I understand that the above understanding is wrong and that basically readNextStartElement will just try to find the next start element, but will stop as soon as it hits /any/ end element (which is /not/ necessarily the end element of the "outermost" element!). So with that in mind it is important to know when to expect any end element, and "skip" these and "consume" the text values <in>between</in>.
> 
> So again thanks a lot for clarifying this!
The logic behind readNextStartElement() may be a bit confusing at first, but it's convenient.

> Cheers, Oliver
> --
> Oliver Knoll
> Dipl. Informatik-Ing. ETH
> COMIT AG - ++41 79 520 95 22
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20101222/910b6199/attachment.bin