[Qt-interest] Parsing XML page
Ross Bencina
rossb-lists at audiomulch.com
Tue Mar 29 20:20:13 CEST 2011
??????? ??????? wrote:
> But if I need read HTML page and must get a block of texts...
Before you said it was XML, now it's HTML. There's quite a difference you
know, especially if it's not well-formed-XML type XHTML.
You need to be clear on whether your source is guaranteed valid XML.
For parsing junk HTML that may not be XML and may contain errors I've used
BeautifulSoup (a Python module, not perfect, but it works).. I'm not sure Qt
has anything similar.
You're also not being clear about how the data is structured. Usually an XML
file would have all the fields you need marked up in separate elements. Just
extract the text of the relevant elements and you're done. If you need to do
freeform parsing of text _within_ XML elements, you might be appropriate to
use regex there, but that wouldn't be for parsing XML, that would be for
parsing individual element text below the level of XML structure.
>> NEVER, EVER, any platform, on any language don't even try to parse XML
>> using
>> RegEx. NEVER.
>> I am writing this, so Google will also show this answer as well when a
>> random Joe will look for XML parsing. Now lets continue with our lives.
Agreed
Ross.
More information about the Qt-interest-old
mailing list