[Interest] Question about QTextStream and codecs

Murphy, Sean smurphy at walbro.com
Thu May 25 17:26:50 CEST 2017


I've encountered an issue with a simple string substitution and was able to fix it but I'm just trying to understand it. Here's a small example of a file that caused the issue:
<?xml version="1.0" encoding="UTF-8"?>
<unitsMap>
	<map>
		<mapItem key="0" value="°"/>
	</map>
<type value="λ"/>
</unitsMap>

So the issue I was having was with the degree and lambda symbols. What my software does is goes through the input file line-by-line looking for lines with the key="0" and inserts another line with a higher number like so:
<?xml version="1.0" encoding="UTF-8"?>
<unitsMap>
	<map>
		<mapItem key="0" value="°"/>
		<mapItem key="1" value="°"/>
	</map>
<type value="λ"/>
</unitsMap>

Originally my software just opened the input file, set a QTextStream on it, looped through performing the insertions and then opened the output file, set another QTextStream on it and then wrote it out. Here's the input
    QTextStream inputStream(&inputXML);
    QTextStream outputStream(&mCloneOutputString, QIODevice::WriteOnly);
    QString sourceString = QString("key=\"%1\"").arg(sourceID);
    QString cloneString = QString("key=\"%1\"").arg(cloneID);

    while(!inputStream.atEnd())
    {
        QString currentLine = inputStream.readLine() + QString("\n");
        // write out current line
        outputStream << currentLine;
        if(currentLine.contains(sourceString))
        {
            QString s = currentLine;
            s.replace(sourceString, cloneString);
            outputStream << s;
        }
    }

And my save routine is just
    QFile saveFile(saveFilename);
    saveFile.open;
    QTextStream ts(&saveFile);
    ts.setCodec("UTF-8");  // I had to insert this line to get it to work
    ts << mCloneOutputString;

What I noticed is that before I added the QTextStream::setCodec() line, my output file was giving errors when opening it in other text editors, saying that I had a character 0xB0 that wasn't part of UTF-8. Sure enough when I looked at the input and output files in a hex editor I saw this for the degree symbol:
    Input file: 0xC2B0, output file 0xB0
Once I added the line:
    ts.setCodec("UTF-8");  
Then I got:
    Input file: 0xC2B0, output file 0xC2B0
And it would open in other applications without errors.

So I have it all working, I'm just trying to understand why the explicit call to setCodec() was necessary on the writing end, but didn't appear to be required on the input end?
Sean



More information about the Interest mailing list