[Development] [Qt5-feedback] A micro API review: for V3(md5) and V5(sha1) in QUuid

Fri Dec 23 18:31:04 CET 2011

[ Re-trying after the previous massive quoting and line-wrap fail :-/ ]

Denis Dzyubenko wrote:
> 2011/12/9 João Abecasis <joao.abecasis at nokia.com>:
> >>    inline QUuid QUuid::createFromName(const QUuid &ns, const
> >>    QString &name)
> >>    {
> >>        return createFromName(ns, name.toUtf8());
> >>    }
> >
> > would only be updated to call the right implementations, as
> > appropriate.
> 
> I like the current status of the patch very much.
> 
> However I have one question - where utf8 comes from? Shouldn't it be
> defined by rfc, and if not imo we shouldn't arbitrary choose
> encodings, and maybe leave the default one in - which is utf-16 for
> QString

This is my reasoning:

1) As you mention the RFC doesn't specify encodings. In fact, it says
the owner of a namespace is free to decide how it should be used. For
this reason it's important that we support QByteArray as the canonical
form and let users make conscious decisions.

2) In Qt, strings of text are represented as QString so it would be nice
to support QString-based names. This is the reason for adding those
overloads as convenience API, but doesn't tell us how QString-based
names should be translated to "a canonical sequence of octets" (quoting
the standard).

3) The point of name-based UUIDs is that you can regenerate the UUIDs
knowing only the namespace UUID and a particular name. If you use the
QByteArray version, it's up to you to ensure this. When using the QString
version Qt needs to ensure it for you.

This excludes locale- and system-dependent conversions, like
toLocal8Bit(), it also excludes straightforward utf16() as it is
dependent on endianness, and thus platform.

4) UTF-8 is a good candidate because it is one possible "canonical
sequence of octets". But it's mostly that, a good candidate.

So, there isn't a reason why it *has* to be utf-8, but I haven't seen
better alternatives. Other alternatives are toAscii or toLatin1, but
they're lossy encodings. Network-byte order UTF-16?...

Anyway, one use case mentioned in the standard makes this convenience
approach very nice:

    QUrl url;

    // ...

    // NameSpace_DNS from RFC4122
    // {6ba7b810-9dad-11d1-80b4-00c04fd430c8}
    QUuid nsDns(0x6ba7b810, 0x9dad, 0x11d1, 0x80, 0xb4,
        0x00, 0xc0, 0x4f, 0xd4, 0x30, 0xc8);

    QUuid uuidForUrl = QUuid::createFromName(nsDns, url.toString());

With the added benefit that in that use case it interoperates with
Python.

("And what does python do?", you ask. Well, it avoids the decision
altogether and bails out on unicode strings. It only accepts a
byte-strings:

    $ python
    Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import uuid
    >>> uuid.NAMESPACE_DNS
    UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
    >>> uuid.uuid3(uuid.NAMESPACE_DNS, "www.widgets.com")
    UUID('3d813cbb-47fb-32ba-91df-831e1593ac29')
    >>> uuid.uuid3(uuid.NAMESPACE_DNS, u"www.widgets.com")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/uuid.py",
        line 512, in uuid3
        hash = md5(namespace.bytes + name).digest()
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position
    1: ordinal not in range(128)

)

What do others think?

Cheers,

João