[Interest] String best practice

Viktor Engelmann viktor.engelmann at qt.io
Wed Mar 15 15:02:12 CET 2017


On 15.03.2017 11:15, Konstantin Tokarev wrote:
>
> 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelmann at qt.io>:
>> If the program is small and you don't want it to ever grow beyond ASCII,
>> using byte arrays is okay, but in my experience, if you want to be
>> future-proof, you should interpret byte-arrays *as soon as possible*.
>>
>> Then you have an object with a controlled format and you can use that
>> throughout your program, without worrying about encodings.
> In the modern world there is one portable encoding used for exchanging data
> between systems: UTF-8. So in wide range of applications one can safely
> assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no
> confusion. YMMV though.

I love UTF-8 myself, encode everything in it, but blindly assuming that
all data is always UTF-8 encoded is just asking for trouble.
On windows, many programs still use latin1 by default.
Also good luck finding out why

   if(c[i] == 0xC3 && c[i+1] == 0xA4)

doesn't catch the

  cc 88 61

in the input file (if you even notice that it doesn't).
QString::normalized to the rescue.

> Things change if you intermix textual and non-textual QByteArray's near in your
> code, in this case it's better to store text strings in objects of different class.
>
>> Keeping the
>> data raw will increase the probability that some module does something
>> wrong because it assumes a wrong encoding and breaks your results (i.e.
>> using bytewise comparison for string comparison, which works for ASCII,
>> but not for unicode - even if both have the same encoding, because there
>> are letters that have multiple different unicode codepoints).

-- 

Viktor Engelmann
Software Engineer

The Qt Company GmbH
Rudower Chaussee 13
D-12489 Berlin

Viktor.Engelmann at qt.io
+49 151 26784521

http://qt.io
Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 144331 B





More information about the Interest mailing list