[Interest] String best practice
Viktor Engelmann
viktor.engelmann at qt.io
Wed Mar 15 15:02:12 CET 2017
On 15.03.2017 11:15, Konstantin Tokarev wrote:
>
> 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelmann at qt.io>:
>> If the program is small and you don't want it to ever grow beyond ASCII,
>> using byte arrays is okay, but in my experience, if you want to be
>> future-proof, you should interpret byte-arrays *as soon as possible*.
>>
>> Then you have an object with a controlled format and you can use that
>> throughout your program, without worrying about encodings.
> In the modern world there is one portable encoding used for exchanging data
> between systems: UTF-8. So in wide range of applications one can safely
> assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no
> confusion. YMMV though.
I love UTF-8 myself, encode everything in it, but blindly assuming that
all data is always UTF-8 encoded is just asking for trouble.
On windows, many programs still use latin1 by default.
Also good luck finding out why
if(c[i] == 0xC3 && c[i+1] == 0xA4)
doesn't catch the
cc 88 61
in the input file (if you even notice that it doesn't).
QString::normalized to the rescue.
> Things change if you intermix textual and non-textual QByteArray's near in your
> code, in this case it's better to store text strings in objects of different class.
>
>> Keeping the
>> data raw will increase the probability that some module does something
>> wrong because it assumes a wrong encoding and breaks your results (i.e.
>> using bytewise comparison for string comparison, which works for ASCII,
>> but not for unicode - even if both have the same encoding, because there
>> are letters that have multiple different unicode codepoints).
--
Viktor Engelmann
Software Engineer
The Qt Company GmbH
Rudower Chaussee 13
D-12489 Berlin
Viktor.Engelmann at qt.io
+49 151 26784521
http://qt.io
Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
More information about the Interest
mailing list