[Interest] String best practice

Harald Vistnes harald.vistnes at gmail.com
Wed Mar 15 12:08:38 CET 2017


So to summarize, it sounds like the recommendation is to use QString and
QTextStream by default unless it turns out to be too slow. In that case one
can optimize by using QByteArray or non-Qt alternatives like re2c if you
have control over the encoding.

If the data read in is later put into QStrings, I guess you can just as
well use QString during parsing, as the strings will be converted to UTF16
at some point anyway. Is that right?

I've written code for reading lots of different formats, some for files up
to several hundred MBs, and each time I wonder if I am doing it the best
way or not. Such a common task and so many ways to do it...

Harald


2017-03-15 11:15 GMT+01:00 Konstantin Tokarev <annulen at yandex.ru>:

>
>
> 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelmann at qt.io>:
> > On 14.03.2017 10:50, Konstantin Tokarev wrote:
> >>  14.03.2017, 12:44, "Harald Vistnes" <harald.vistnes at gmail.com>:
> >>>  Hi,
> >>>
> >>>  I'm currently working on reading and parsing large ASCII based text
> files and I am wondering what is the current best practice. There are so
> many classes and macros available, so it can be a bit confusing to know
> what to use when.
> >>>
> >>>  QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal,
> QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder and
> so on. And then std::string and raw const char* strings.
> >>>
> >>>  In my case I want to read a large ASCII file line by line, so I don't
> need unicode. I need to compare a string with a literal, extract substrings
> and convert some strings to numbers.
> >>>
> >>>  Should I just use QString all the way, or is it faster to use some
> other classes when you know you don't need unicode?
> >>  You should use QByteArray here, which is what QIODevice::readLine()
> returns. Avoid using QString as long as possible because that will trigger
> conversion of your text to UTF16 encoding, which may be totally useless in
> your use case.
> >
> > If the program is small and you don't want it to ever grow beyond ASCII,
> > using byte arrays is okay, but in my experience, if you want to be
> > future-proof, you should interpret byte-arrays *as soon as possible*.
> >
> > Then you have an object with a controlled format and you can use that
> > throughout your program, without worrying about encodings.
>
> In the modern world there is one portable encoding used for exchanging data
> between systems: UTF-8. So in wide range of applications one can safely
> assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes
> no
> confusion. YMMV though.
>
> Things change if you intermix textual and non-textual QByteArray's near in
> your
> code, in this case it's better to store text strings in objects of
> different class.
>
> > Keeping the
> > data raw will increase the probability that some module does something
> > wrong because it assumes a wrong encoding and breaks your results (i.e.
> > using bytewise comparison for string comparison, which works for ASCII,
> > but not for unicode - even if both have the same encoding, because there
> > are letters that have multiple different unicode codepoints).
> >
> > --
> >
> > Viktor Engelmann
> > Software Engineer
> >
> > The Qt Company GmbH
> > Rudower Chaussee 13
> > D-12489 Berlin
> >
> > Viktor.Engelmann at qt.io
> > +49 151 26784521
> >
> > http://qt.io
> > Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
> > Sitz der Gesellschaft: Berlin
> > Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
> >
> > _______________________________________________
> > Interest mailing list
> > Interest at qt-project.org
> > http://lists.qt-project.org/mailman/listinfo/interest
>
> --
> Regards,
> Konstantin
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20170315/9ce0c302/attachment.html>


More information about the Interest mailing list