[Development] [Qt5-feedback] A micro API review: for V3(md5) and V5(sha1) in QUuid

Mon Dec 26 11:11:46 CET 2011

On 12/23/11 6:31 PM, "ext joao.abecasis at nokia.com"
<joao.abecasis at nokia.com> wrote:

>[ Re-trying after the previous massive quoting and line-wrap fail :-/ ]
>
>Denis Dzyubenko wrote:
>> 2011/12/9 João Abecasis <joao.abecasis at nokia.com>:
>> >>    inline QUuid QUuid::createFromName(const QUuid &ns, const
>> >>    QString &name)
>> >>    {
>> >>        return createFromName(ns, name.toUtf8());
>> >>    }
>> >
>> > would only be updated to call the right implementations, as
>> > appropriate.
>> 
>> I like the current status of the patch very much.
>> 
>> However I have one question - where utf8 comes from? Shouldn't it be
>> defined by rfc, and if not imo we shouldn't arbitrary choose
>> encodings, and maybe leave the default one in - which is utf-16 for
>> QString
>
>This is my reasoning:
>
>1) As you mention the RFC doesn't specify encodings. In fact, it says
>the owner of a namespace is free to decide how it should be used. For
>this reason it's important that we support QByteArray as the canonical
>form and let users make conscious decisions.
>
>2) In Qt, strings of text are represented as QString so it would be nice
>to support QString-based names. This is the reason for adding those
>overloads as convenience API, but doesn't tell us how QString-based
>names should be translated to "a canonical sequence of octets" (quoting
>the standard).
>
>3) The point of name-based UUIDs is that you can regenerate the UUIDs
>knowing only the namespace UUID and a particular name. If you use the
>QByteArray version, it's up to you to ensure this. When using the QString
>version Qt needs to ensure it for you.
>
>This excludes locale- and system-dependent conversions, like
>toLocal8Bit(), it also excludes straightforward utf16() as it is
>dependent on endianness, and thus platform.
>
>4) UTF-8 is a good candidate because it is one possible "canonical
>sequence of octets". But it's mostly that, a good candidate.
>
>So, there isn't a reason why it *has* to be utf-8, but I haven't seen
>better alternatives. Other alternatives are toAscii or toLatin1, but
>they're lossy encodings. Network-byte order UTF-16?...
>
>Anyway, one use case mentioned in the standard makes this convenience
>approach very nice:
>
>    QUrl url;
>
>    // ...
>
>    // NameSpace_DNS from RFC4122
>    // {6ba7b810-9dad-11d1-80b4-00c04fd430c8}
>    QUuid nsDns(0x6ba7b810, 0x9dad, 0x11d1, 0x80, 0xb4,
>        0x00, 0xc0, 0x4f, 0xd4, 0x30, 0xc8);
>
>    QUuid uuidForUrl = QUuid::createFromName(nsDns, url.toString());
>
>With the added benefit that in that use case it interoperates with
>Python.
>
>("And what does python do?", you ask. Well, it avoids the decision
>altogether and bails out on unicode strings. It only accepts a
>byte-strings:
>
>    $ python
>    Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
>    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
>    Type "help", "copyright", "credits" or "license" for more information.
>    >>> import uuid
>    >>> uuid.NAMESPACE_DNS
>    UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
>    >>> uuid.uuid3(uuid.NAMESPACE_DNS, "www.widgets.com")
>    UUID('3d813cbb-47fb-32ba-91df-831e1593ac29')
>    >>> uuid.uuid3(uuid.NAMESPACE_DNS, u"www.widgets.com")
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>      File 
>"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/uu
>id.py",
>        line 512, in uuid3
>        hash = md5(namespace.bytes + name).digest()
>    UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position
>    1: ordinal not in range(128)
>
>)
>
>What do others think?

I can see only two options that make sense. Either accept only ascii (ie.
code points smaller 0x80), or use utf-8. The first option is a subset of
the second one. 

Cheers,
Lars