[Qt5-feedback] QUrl (was: QMimeType)
Thiago Macieira
thiago at kde.org
Sun Jun 5 12:00:53 CEST 2011
Em Sunday, 5 de June de 2011, às 10:00:37, Ivan Čukić escreveu:
> It currently inherits KServiceType which will be changed, and it uses
> KUrl which largely exists due to some parsing problems in QUrl. We are
> hoping to push the fixes to QUrl to allow us to drop KUrl in kdelibs
> 5.
KUrl doesn't do any parsing. It uses QUrl for parsing. Therefore, "parsing
problems in QUrl" cannot be true, as it would be KUrl parsing problems too.
KUrl exists mostly to keep KDE 3 KURL API compatibility.
In any case, QUrl in Qt 5 requires a rewrite of its API. Not the parsing --
that one is fine. QUrl has a completely flawed API, owed to long-time
misunderstanding of what a URL is.
URLs and URIs are "designed by committee" and are simultaneously:
- Unicode
- UTF-8 encoded
- binary
So the following two URLs are the same:
http://localhost/R%C3%A9sum%C3%A9.pdf
http://localhost/Résumé.pdf
but the following URL is permitted too:
http://localhost/R%E9sum%E9.pdf
Note how "é" expands to %C3%A9 (URLs are Unicode UTF-8 encoded) but at the
same time the byte 0xE9 is permitted too (non-UTF8). QString is therefore
inadequate to represent this in fully-decoded form for the path component: it
is "/Résumé.pdf" for the first two URLs, but what is its value for the third?
Also note how the following two URLs are *not* the same:
http://localhost/foo/bar
http://localhost/foo%2Fbar
despite the slash character being 0x2F.
So again QString is inadequate to represent a component of a URL in fully-
decoded form which is what the QUrl::path() does. At the same time,
QUrl::encodedPath() returning a QByteArray with %-encoding is hard to use.
The slash character may be a corner case, but these two are also defnitely not
the same:
http://localhost/foo?arg=value#anchor
http://localhost/foo%3Farg=value%23anchor
QUrl decodes the second URL properly, and QUrl::path() returns
"/foo?arg=value#anchor", which is fine. But then if you call QUrl::toString(),
you get the first URL, which is *not* fine, as we established that they are
different URLs. And to top it all off, QUrl's constructor uses the same flawed
fully-decoded notation.
In my view, QUrl should be modified to use *only* partially-decoded components
and provide a method (toEncoded()) that returns the fully-encoded form for
proper network transfer. The partially-decoded form would decode %-encodings
that are UTF-8 sequences, including %20 to space, but not including delimiter
characterrs (so it won't decode %3F to a question mark in a path component,
but it would decode it in the query and fragment component).
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Senior Product Manager - Nokia, Qt Development Frameworks
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt.nokia.com/pipermail/qt5-feedback/attachments/20110605/dbe9c969/attachment.bin
More information about the Qt5-feedback
mailing list