[Development] [Qt5-feedback] A micro API review: for V3(md5) and V5(sha1) in QUuid
lars.knoll at nokia.com
lars.knoll at nokia.com
Mon Dec 26 11:11:46 CET 2011
On 12/23/11 6:31 PM, "ext joao.abecasis at nokia.com"
<joao.abecasis at nokia.com> wrote:
>[ Re-trying after the previous massive quoting and line-wrap fail :-/ ]
>
>Denis Dzyubenko wrote:
>> 2011/12/9 João Abecasis <joao.abecasis at nokia.com>:
>> >> inline QUuid QUuid::createFromName(const QUuid &ns, const
>> >> QString &name)
>> >> {
>> >> return createFromName(ns, name.toUtf8());
>> >> }
>> >
>> > would only be updated to call the right implementations, as
>> > appropriate.
>>
>> I like the current status of the patch very much.
>>
>> However I have one question - where utf8 comes from? Shouldn't it be
>> defined by rfc, and if not imo we shouldn't arbitrary choose
>> encodings, and maybe leave the default one in - which is utf-16 for
>> QString
>
>This is my reasoning:
>
>1) As you mention the RFC doesn't specify encodings. In fact, it says
>the owner of a namespace is free to decide how it should be used. For
>this reason it's important that we support QByteArray as the canonical
>form and let users make conscious decisions.
>
>2) In Qt, strings of text are represented as QString so it would be nice
>to support QString-based names. This is the reason for adding those
>overloads as convenience API, but doesn't tell us how QString-based
>names should be translated to "a canonical sequence of octets" (quoting
>the standard).
>
>3) The point of name-based UUIDs is that you can regenerate the UUIDs
>knowing only the namespace UUID and a particular name. If you use the
>QByteArray version, it's up to you to ensure this. When using the QString
>version Qt needs to ensure it for you.
>
>This excludes locale- and system-dependent conversions, like
>toLocal8Bit(), it also excludes straightforward utf16() as it is
>dependent on endianness, and thus platform.
>
>4) UTF-8 is a good candidate because it is one possible "canonical
>sequence of octets". But it's mostly that, a good candidate.
>
>So, there isn't a reason why it *has* to be utf-8, but I haven't seen
>better alternatives. Other alternatives are toAscii or toLatin1, but
>they're lossy encodings. Network-byte order UTF-16?...
>
>Anyway, one use case mentioned in the standard makes this convenience
>approach very nice:
>
> QUrl url;
>
> // ...
>
> // NameSpace_DNS from RFC4122
> // {6ba7b810-9dad-11d1-80b4-00c04fd430c8}
> QUuid nsDns(0x6ba7b810, 0x9dad, 0x11d1, 0x80, 0xb4,
> 0x00, 0xc0, 0x4f, 0xd4, 0x30, 0xc8);
>
> QUuid uuidForUrl = QUuid::createFromName(nsDns, url.toString());
>
>With the added benefit that in that use case it interoperates with
>Python.
>
>("And what does python do?", you ask. Well, it avoids the decision
>altogether and bails out on unicode strings. It only accepts a
>byte-strings:
>
> $ python
> Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import uuid
> >>> uuid.NAMESPACE_DNS
> UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
> >>> uuid.uuid3(uuid.NAMESPACE_DNS, "www.widgets.com")
> UUID('3d813cbb-47fb-32ba-91df-831e1593ac29')
> >>> uuid.uuid3(uuid.NAMESPACE_DNS, u"www.widgets.com")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
>"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/uu
>id.py",
> line 512, in uuid3
> hash = md5(namespace.bytes + name).digest()
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position
> 1: ordinal not in range(128)
>
>)
>
>What do others think?
I can see only two options that make sense. Either accept only ascii (ie.
code points smaller 0x80), or use utf-8. The first option is a subset of
the second one.
Cheers,
Lars
More information about the Development
mailing list