[Interest] How to add tidy using Qt?

程梁 devbean at outlook.com
Thu Dec 27 09:40:12 CET 2012


Thank you! I did solve the problem as Tony said. But could you tell me why this happen? Why I have to assign QByteArray to another variable? And why only the content is wrong?

Cheng Liang
Nanjing, China
http://www.devbean.info

Date: Thu, 27 Dec 2012 16:30:50 +0800
Subject: Re: [Interest] How to add tidy using Qt?
From: dbzhang800 at gmail.com
To: devbean at outlook.com
CC: tony at rightsoft.com.au; interest at qt-project.org


On Thu, Dec 27, 2012 at 3:38 PM, ³ÌÁº <devbean at outlook.com> wrote:




Noop, I've done nothing, just code for testing. So they are all in the same function:

const char* html = d->visualView->toFormattedHtml().toUtf8().constData();

No, this line is wrong. You should do as Tony told you. 

TidyBuffer output;


TidyDoc tdoc = tidyCreate();



tidyBufInit(&output);Bool ok = tidyOptSetBool(tdoc, TidyXmlOut, yes);

int rc = -1;

// config tidy
if ( ok ) ok = tidyOptSetBool( tdoc, TidyXmlDecl, no );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyDropPropAttrs, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyMakeBare, yes );
if ( ok ) ok = tidyOptSetValue( tdoc, TidyBodyOnly, "yes" );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyDropFontTags, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyFixComments, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyEscapeCdata, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyJoinStyles, yes );
if ( ok ) ok = tidyOptSetBool( tdoc, TidyEscapeCdata, yes);
if ( ok ) ok = tidyOptSetBool( tdoc, TidyHideComments, yes);
if ( ok ) ok = tidyOptSetBool( tdoc, TidyForceOutput, yes);
tidySetCharEncoding(tdoc, "utf8");
// start parsing

if (rc >= 0) {
    rc = tidyParseString(tdoc, html);
}
if (rc >= 0) {
    rc = tidyCleanAndRepair(tdoc);
}

if (rc >= 0) {
    rc = tidySaveBuffer(tdoc, &output);
}

if (rc >= 0) {
    printf("\nAnd here is the result:\n\n%s", output.bp);


    QByteArray ret = QByteArray(reinterpret_cast<char*>(output.bp),output.size);
} else {
    printf("A severe error (%d) occurred.\n", rc);
}

tidyBufFree(&output);
tidyRelease(tdoc);


This is all code I use. The output is:

<html>
<head>
<meta name="generator"
content="HTML Tidy for Windows (vers 25 March 2009), see www.w3.org" />

<title></title>
</head>
<body>��@- at H a�S�������Ýï¿ï¿½ï¿½</body>
</html>

All tages generating by QWebView are correct but text I edited in wev view(using page()->setContentEditable(true);) is incorrect. So I think this is because the encoding is wrong. But I do use toUtf8() to get char *(toUtf8().constData()).



Cheng Liang
Nanjing, China
http://www.devbean.info


From: tony at rightsoft.com.au
To: interest at qt-project.org
Date: Thu, 27 Dec 2012 18:15:49 +1100

Subject: Re: [Interest] How to add tidy using Qt?

Hi Cheng,  
You didn't show how the html variable was used?  Are you sure that the html pointer is valid?   I suggest:  
 QByteArray ba = webView->page()->mainFrame()->toHtml().toUtf8();
const char *html = ba.constData();...
 In your original code, the temporary string and byte array were being destroyed after that statement, causing a dangling pointer in html.  Even if it looked valid immediately after, subsequent allocations would have made the area appear corrupt.  By keeping a reference to the temporary, you ensure that it will be valid for the whole block.  
 Tony.
  
From: interest-bounces+tony=rightsoft.com.au at qt-project.org [mailto:interest-bounces+tony=rightsoft.com.au at qt-project.org] On Behalf Of ??

Sent: Thursday, 27 December 2012 4:52 PM
To: Qt Interest
Subject: [Interest] How to add tidy using Qt? Hi, there! I want to add tidy as library into Qt program where I use some code as following:


const char* html = webView->page()->mainFrame()->toHtml().toUtf8().constData();

// ...
TidyBuffer output;
Bool ok = tidyOptSetBool(tdoc, TidyXmlOut, yes);

tidySetCharEncoding(tdoc, "utf8");
tidySaveBuffer(tdoc, &output);

QByteArray ret = QByteArray(reinterpret_cast<char*>(output.bp),output.size);


But I got warning: 

UTF-8 decoding error of 1 bytes : 0xff = U+0255lx 
UTF-8 decoding error of 1 bytes : 0xff = U+0255lxUTF-8 decoding error of 1 bytes : 0xe8 = U+0008lx
UTF-8 decoding error of 1 bytes : 0xe8 = U+0008lxUTF-8 decoding error of 1 bytes : 0xfd = U+0001lx
UTF-8 decoding error of 1 bytes : 0xfd = U+0001lxUTF-8 decoding error of 1 bytes : 0xfd = U+0001lx
UTF-8 decoding error of 1 bytes : 0xfd = U+0001lx

I've no idea how this happen. Could you help me? Thank you!

Cheng Liang
Nanjing, China
http://www.devbean.info 

_______________________________________________
Interest mailing list
Interest at qt-project.org
http://lists.qt-project.org/mailman/listinfo/interest 		 	   		  

_______________________________________________

Interest mailing list

Interest at qt-project.org

http://lists.qt-project.org/mailman/listinfo/interest



 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20121227/cec35b45/attachment.html>


More information about the Interest mailing list