[Interest] String best practice

Konstantin Tokarev annulen at yandex.ru
Wed Mar 15 11:15:55 CET 2017



15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelmann at qt.io>:
> On 14.03.2017 10:50, Konstantin Tokarev wrote:
>>  14.03.2017, 12:44, "Harald Vistnes" <harald.vistnes at gmail.com>:
>>>  Hi,
>>>
>>>  I'm currently working on reading and parsing large ASCII based text files and I am wondering what is the current best practice. There are so many classes and macros available, so it can be a bit confusing to know what to use when.
>>>
>>>  QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal, QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder and so on. And then std::string and raw const char* strings.
>>>
>>>  In my case I want to read a large ASCII file line by line, so I don't need unicode. I need to compare a string with a literal, extract substrings and convert some strings to numbers.
>>>
>>>  Should I just use QString all the way, or is it faster to use some other classes when you know you don't need unicode?
>>  You should use QByteArray here, which is what QIODevice::readLine() returns. Avoid using QString as long as possible because that will trigger conversion of your text to UTF16 encoding, which may be totally useless in your use case.
>
> If the program is small and you don't want it to ever grow beyond ASCII,
> using byte arrays is okay, but in my experience, if you want to be
> future-proof, you should interpret byte-arrays *as soon as possible*.
>
> Then you have an object with a controlled format and you can use that
> throughout your program, without worrying about encodings.

In the modern world there is one portable encoding used for exchanging data
between systems: UTF-8. So in wide range of applications one can safely
assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no
confusion. YMMV though.

Things change if you intermix textual and non-textual QByteArray's near in your
code, in this case it's better to store text strings in objects of different class.

> Keeping the
> data raw will increase the probability that some module does something
> wrong because it assumes a wrong encoding and breaks your results (i.e.
> using bytewise comparison for string comparison, which works for ASCII,
> but not for unicode - even if both have the same encoding, because there
> are letters that have multiple different unicode codepoints).
>
> --
>
> Viktor Engelmann
> Software Engineer
>
> The Qt Company GmbH
> Rudower Chaussee 13
> D-12489 Berlin
>
> Viktor.Engelmann at qt.io
> +49 151 26784521
>
> http://qt.io
> Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
> Sitz der Gesellschaft: Berlin
> Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
>
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/interest

-- 
Regards,
Konstantin



More information about the Interest mailing list