[Development] Hardcoded strings and qstricmp comparison

Wed Nov 14 11:13:51 CET 2018

Edward Welbourne <edward.welbourne at qt.io> skrev følgende den 14.11.2018, 11:03:

    Andy Shaw (14 November 2018 09:34) wrote:
    > ... there may be some problems that are connected to using qstricmp
    > and other functions that are expecting latin1 strings for one reason
    > or another. The reason that this might be a problem is because we are
    > encoding our source code as UTF-8 and there is a theoretical problem
    > that due to this that we are not protecting the strings correctly to
    > ensure that they are treated as latin1 when we explicitly write them
    > internally. It could be that reality is that this will never be a
    > problem, and if that is the case then please give me the background on
    > that so I can pass this on too.
    >
    > For user code I get that we can just say that they should do something
    > like:
    >
    >   qstricmp(str, QLatin1String("a").latin1());
    >
    > and that would be ensuring it is correctly seen as a latin1 encoded
    > string. If this is how it should be done, then shouldn’t we change our
    > usage of it in the Qt code as well to do the same thing? Or am I
    > missing something?

    Well, as long as the source string is actually printable ASCII, there
    should be no problem, as UTF-8 and Latin-1 agree on those.

    If the source string contains bytes > 127, such bytes should be encoded
    using suitable escapes; if the string is to be read as Latin-1 and the
    source file is encoded in UTF-8, the raw form of the bytes would not be
    displayed as the character it would actually be read as.

    So I'm not quite sure what problem you're referring to, but if a string
    in the source is meant to be Latin-1, it shouldn't be entered in literal
    form in the source code, it should be entered using escapes,

      QLatin1Char yUmlaut('\xff');

    for example.

I am speaking purely theoretically, there may not be a problem, I don't know enough to be sure that’s why I am asking ( Though I think this clarifies enough for me, as long as we are using characters <= 127 in ASCII then there is no problem on our side. And the user has to just be aware of it themselves when they are using these functions to remember that their source code will be UTF-8 encoded and not Latin1 and thus should escape characters > 127.

Andy