[Development] QTextCodec removal and QXmlStreamWriter

Kai Pastor, DG0YT dg0yt at darc.de
Sun Nov 17 10:43:00 CET 2019


(From "RFC: Defaulting to or enforcing UTF-8 locales on Unix systems"...)

Am 17.11.19 um 01:55 schrieb Thiago Macieira:

> It all started with a change (see OP) about removing QTextCodec from the API
> and from QtCore. It seemed reasonable enough but it turned up quite a few
> kinks that hadn't been predicted. One of them, which may still be a
> showstopper, is QXmlStreamReader's inability to handle XML data encoded in
> anything except UTF-8, though a thorough search of all XML files in my system
> turned up exactly zero such files.
By default, QXmlStreamWriter outputs UTF-8. With QTextCodec removed, 
will QXmlStreamWriter always output UTF-8? If so, will it be changed to 
handle UTF-8 input as efficient as possible?

At the moment, the public API is just QString. So unless you have 
QString already, you convert from UTF-8, Latin-1 or raw numerical types 
to UTF-16 (QString), and then QXmlStreamWriter converts to UTF-8 for 
output. The double conversion burns a lot of CPU and time, including 
memory allocations, for what I consider a typical use case. As an 
example, think of an SVG document where graphical "paths" are very long 
sequences of letters and numbers which are known to be Latin-1 and to 
not need any escaping. The effect can be studied by sending the 
characters directly to the device instead of going through 
QXmlStreamWriter::writeCharacters().

Latin-1 element names and attribute names are quite common, too. So they 
might also be considered for avoiding the UTF-16 (QString) conversion step.

Kai



More information about the Development mailing list