[Development] [Qt5-feedback] A micro API review: for V3(md5) and V5(sha1) in QUuid

Fri Dec 30 15:50:38 CET 2011

Hi João

On Fri, Dec 23, 2011 at 6:31 PM,  <joao.abecasis at nokia.com> wrote:
> [ Re-trying after the previous massive quoting and line-wrap fail :-/ ]
>
> Denis Dzyubenko wrote:
>> 2011/12/9 João Abecasis <joao.abecasis at nokia.com>:
>> >>    inline QUuid QUuid::createFromName(const QUuid &ns, const
>> >>    QString &name)
>> >>    {
>> >>        return createFromName(ns, name.toUtf8());
>> >>    }
>> >
>> > would only be updated to call the right implementations, as
>> > appropriate.
>>
>> I like the current status of the patch very much.
>>
>> However I have one question - where utf8 comes from? Shouldn't it be
>> defined by rfc, and if not imo we shouldn't arbitrary choose
>> encodings, and maybe leave the default one in - which is utf-16 for
>> QString
>
> This is my reasoning:
>
> 1) As you mention the RFC doesn't specify encodings. In fact, it says
> the owner of a namespace is free to decide how it should be used. For
> this reason it's important that we support QByteArray as the canonical
> form and let users make conscious decisions.

absolutely agree with that. I would even add an overload that takes
(char *, int len) to avoid mallocing a d-pointer for QByteArray.

> 2) In Qt, strings of text are represented as QString so it would be nice
> to support QString-based names. This is the reason for adding those
> overloads as convenience API, but doesn't tell us how QString-based
> names should be translated to "a canonical sequence of octets" (quoting
> the standard).
>
> 3) The point of name-based UUIDs is that you can regenerate the UUIDs
> knowing only the namespace UUID and a particular name. If you use the
> QByteArray version, it's up to you to ensure this. When using the QString
> version Qt needs to ensure it for you.
>
> This excludes locale- and system-dependent conversions, like
> toLocal8Bit(), it also excludes straightforward utf16() as it is
> dependent on endianness, and thus platform.
>
> 4) UTF-8 is a good candidate because it is one possible "canonical
> sequence of octets". But it's mostly that, a good candidate.

that is a very good reason indeed! I didn't think about endianness of utf-16.

Another alternative would be to always use utf-16 little endian (since
this is the most common system) in a canonical form (e.g. D-form to
make it cheap on mac).

> So, there isn't a reason why it *has* to be utf-8, but I haven't seen
> better alternatives. Other alternatives are toAscii or toLatin1, but
> they're lossy encodings. Network-byte order UTF-16?...

Denis.