[Qt-interest] Parsing a HTML document with non-supported html subsets

Castagne Nicolas nicolascastagne at yahoo.fr
Tue Jul 21 14:03:58 CEST 2009


> This may not be the best solution, but I guess you could try to parse your  
> HTML file manually as an XML document.

Thanks Constantin,

I have tried that, but the HTML string generated by Excel is not a valid XML file :/

The html string has, for example, something like:
____
<body link="#0000d4" vlink="#993366">
<table border=0 cellpadding=0 cellspacing=0 width=190 style='border-collapse:
 collapse'>
___

and QDomDocument::setContent complains that :
"unexpected character" on line 3 and column  15 

I guess that is due to the fact that the attribute (eg border) values are not comma-encapsulated.



Yeah, i know, Excel is bad :)

Any other hints welcome. I'll try parsing with reg exp.

Best-
Nicolas

--- En date de : Mar 21.7.09, Constantin Makshin <dinosaur-rus at users.sourceforge.net> a écrit :

De: Constantin Makshin <dinosaur-rus at users.sourceforge.net>
Objet: Re: [Qt-interest] Parsing a HTML document with non-supported html subsets
À: "Qt-interest" <qt-interest at trolltech.com>
Date: Mardi 21 Juillet 2009, 11h07

This may not be the best solution, but I guess you could try to parse your  
HTML file manually as an XML document.

On Tue, 21 Jul 2009 11:54:55 +0400, Castagne Nicolas  
<nicolascastagne at yahoo.fr> wrote:
> Hi all and thanks Frank.
>
>> Do you mean ?
>> QString text=file->readAll();
>> Browser->setHtml(text)
>
> Indeed, no.
>
> My problem is that I have a html string containg a table with cells with  
> specific attributes, such as:
> ***********************
>   <tr height=13>
>    <td class=xl24 align=right width=75  
> x:num="1.23456789012346E28">1,23E+28</td>
>  </tr>
> ************************
>
> in which I need to retrieve the x:num attribute value in the <td> tag.
>
>
> Unfortunately, calls such as:
>     Browser->setHtml(text)
> removes some attributes that are present in text, ie:
>     Browser->toHtml()
> then returns a string without the x:num attribute.
>
> Hence : how should I parse my html string ?
> Is there a **hack** with QTextDocument that would allow me to ?
>
> Hope it is clearer.
> Best-
> Nicolas
>
>
> --- En date de : Lun 20.7.09, Frank Lutz <frank422542 at googlemail.com> a  
> écrit :
>
> De: Frank Lutz <frank422542 at googlemail.com>
> Objet: Re: [Qt-interest] Parsing a HTML document with non-supported html  
> subsets
> À: qt-interest at trolltech.com
> Date: Lundi 20 Juillet 2009, 16h53
>
> Do you mean ?
>
> ------
> QString text=file->readAll();
>        
> Browser->setHtml(text)
> --------
>
> greetings!
> _______________________________________________

-- 
Constantin "Dinosaur" Makshin
_______________________________________________
Qt-interest mailing list
Qt-interest at trolltech.com
http://lists.trolltech.com/mailman/listinfo/qt-interest



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090721/9db72212/attachment.html 


More information about the Qt-interest-old mailing list