[Interest] String best practice

Konstantin Tokarev annulen at yandex.ru
Wed Mar 15 12:17:36 CET 2017



15.03.2017, 14:09, "Harald Vistnes" <harald.vistnes at gmail.com>:
> So to summarize, it sounds like the recommendation is to use QString and QTextStream by default unless it turns out to be too slow. In that case one can optimize by using QByteArray or non-Qt alternatives like re2c if you have control over the encoding.

For the record, recent version of re2c support UTF16, so you can process QTextStream output with it as well.

>
> If the data read in is later put into QStrings, I guess you can just as well use QString during parsing, as the strings will be converted to UTF16 at some point anyway. Is that right?

That largely depends on your data. If you have lots of fixed "control words" and numbers that won't end up in QStrings in your data structures resulting from parsing, this may be wrong. In other workload it may be perfectly resonable. For small amounts of data it just doesn't matter.

> I've written code for reading lots of different formats, some for files up to several hundred MBs, and each time I wonder if I am doing it the best way or not. Such a common task and so many ways to do it...

Run your code with profiler (e.g. callgrind) and see where most time is spent. Results may surprise you.

> Harald
>
> 2017-03-15 11:15 GMT+01:00 Konstantin Tokarev <annulen at yandex.ru>:
>
>> 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelmann at qt.io>:
>>> On 14.03.2017 10:50, Konstantin Tokarev wrote:
>>>>  14.03.2017, 12:44, "Harald Vistnes" <harald.vistnes at gmail.com>:
>>>>>  Hi,
>>>>>
>>>>>  I'm currently working on reading and parsing large ASCII based text files and I am wondering what is the current best practice. There are so many classes and macros available, so it can be a bit confusing to know what to use when.
>>>>>
>>>>>  QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal, QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder and so on. And then std::string and raw const char* strings.
>>>>>
>>>>>  In my case I want to read a large ASCII file line by line, so I don't need unicode. I need to compare a string with a literal, extract substrings and convert some strings to numbers.
>>>>>
>>>>>  Should I just use QString all the way, or is it faster to use some other classes when you know you don't need unicode?
>>>>  You should use QByteArray here, which is what QIODevice::readLine() returns. Avoid using QString as long as possible because that will trigger conversion of your text to UTF16 encoding, which may be totally useless in your use case.
>>>
>>> If the program is small and you don't want it to ever grow beyond ASCII,
>>> using byte arrays is okay, but in my experience, if you want to be
>>> future-proof, you should interpret byte-arrays *as soon as possible*.
>>>
>>> Then you have an object with a controlled format and you can use that
>>> throughout your program, without worrying about encodings.
>>
>> In the modern world there is one portable encoding used for exchanging data
>> between systems: UTF-8. So in wide range of applications one can safely
>> assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no
>> confusion. YMMV though.
>>
>> Things change if you intermix textual and non-textual QByteArray's near in your
>> code, in this case it's better to store text strings in objects of different class.
>>
>>> Keeping the
>>> data raw will increase the probability that some module does something
>>> wrong because it assumes a wrong encoding and breaks your results (i.e.
>>> using bytewise comparison for string comparison, which works for ASCII,
>>> but not for unicode - even if both have the same encoding, because there
>>> are letters that have multiple different unicode codepoints).
>>>
>>> --
>>>
>>> Viktor Engelmann
>>> Software Engineer
>>>
>>> The Qt Company GmbH
>>> Rudower Chaussee 13
>>> D-12489 Berlin
>>>
>>> Viktor.Engelmann at qt.io
>>> +49 151 26784521
>>>
>>> http://qt.io
>>> Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
>>> Sitz der Gesellschaft: Berlin
>>> Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
>>>
>>> _______________________________________________
>>> Interest mailing list
>>> Interest at qt-project.org
>>> http://lists.qt-project.org/mailman/listinfo/interest
>>
>> --
>> Regards,
>> Konstantin
>> _______________________________________________
>> Interest mailing list
>> Interest at qt-project.org
>> http://lists.qt-project.org/mailman/listinfo/interest


-- 
Regards,
Konstantin



More information about the Interest mailing list