[Qt-interest] Parsing a HTML document with non-supported htmlsubsets

Castagne Nicolas nicolascastagne at yahoo.fr
Wed Jul 22 09:55:04 CEST 2009


Hello Tony,

It can, through export. But I am working on copy/paste features.

Indeed, excel copy fonction does not load XHTML in the clipboard :(

Excel copy feature only put text/html and text/plain mime types in the clipboard.
And only the text/html encodes the "full value" in a x:num attribute of the <td> tags.

I enclose the content of a text/html copied from excel in an attached document.

Well, anyway, I 'll manage things with regexp parsing. But that's not best.
This is the third time I come to think that Qt would benefit from having an HTML parser :/

Best-
Nicolas

--- En date de : Mer 22.7.09, Tony Rietwyk <tony.rietwyk at rightsoft.com.au> a écrit :

De: Tony Rietwyk <tony.rietwyk at rightsoft.com.au>
Objet: RE: [Qt-interest] Parsing a HTML document with non-supported htmlsubsets
À: "'Castagne Nicolas'" <nicolascastagne at yahoo.fr>
Date: Mercredi 22 Juillet 2009, 2h33



 
Message

Hi 
Nicolas, 
 
Can 
Excel output XHTML, rather than HTML? 
 
Regards,
 
 

  
  -----Original Message-----
From: 
  qt-interest-bounces at trolltech.com [mailto:qt-interest-bounces at trolltech.com] 
  On Behalf Of Castagne Nicolas
Sent: Tuesday, 21 July 2009 
  22:04
To: Qt-interest
Subject: Re: [Qt-interest] Parsing a 
  HTML document with non-supported htmlsubsets


  
    
    
      > This may not be the best solution, but I guess you 
        could try to parse your  
> HTML file manually as an XML 
        document.

Thanks Constantin,

I have tried that, but the 
        HTML string generated by Excel is not a valid XML file :/

The 
        html string has, for example, something like:
____
<body 
        link="#0000d4" vlink="#993366">
<table border=0 cellpadding=0 
        cellspacing=0 width=190 
        style='border-collapse:
 collapse'>
___

and 
        QDomDocument::setContent complains that :
"unexpected character" on 
        line 3 and column  15 

I guess that is due to the fact that 
        the attribute (eg border) values are not 
        comma-encapsulated.


Yeah, i know, Excel is bad :)

Any 
        other hints welcome. I'll try parsing with reg 
        exp.

Best-
Nicolas



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090722/9c52d5c6/attachment.html 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ExcelClipboardContent_text_html.txt
Url: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090722/9c52d5c6/attachment.txt 


More information about the Qt-interest-old mailing list