[Qt5-feedback] QUrl (was: QMimeType)

David Faure faure at kde.org
Sun Jun 5 14:06:20 CEST 2011


On Sunday 05 June 2011, Thiago Macieira wrote:
> KUrl doesn't do any parsing. It uses QUrl for parsing. Therefore, "parsing
> problems in QUrl" cannot be true, as it would be KUrl parsing problems too.
> KUrl exists mostly to keep KDE 3 KURL API compatibility.

Yes, but not only. It also makes QUrl actually work :-)

QUrl::toString() is flawed. If you have a file with '#' in its name, you'll get
the '#' unescaped in the toString() output, which makes it basically useless 
(only useful for pure information purposes).

Similarly, the construction of a url from a string should be based on the 
encoded format, not on the ambiguous decoded format.
For this reason the KUrl(const QString& url) constructor calls 
  setEncodedUrl(url, QUrl::TolerantMode).
rather than QUrl(url) which says "human readable, not percent encoded".

Experience shows that everything in the code should rather work with percent 
encoded URLs, and they should be made human readable only at the point where 
they are shown to the user.

> In any case, QUrl in Qt 5 requires a rewrite of its API. Not the parsing --
> that one is fine.

Yes the parsing code is fine, but the parsing-from-encoded should be used by 
default (in tolerant mode btw), rather than the parsing-from-human-readable.

> The slash character may be a corner case, but these two are also defnitely
> not the same:
> 	http://localhost/foo?arg=value#anchor
> 	http://localhost/foo%3Farg=value%23anchor
> 
> QUrl decodes the second URL properly, and QUrl::path() returns
> "/foo?arg=value#anchor", which is fine. But then if you call
> QUrl::toString(), you get the first URL, which is *not* fine, as we
> established that they are different URLs. And to top it all off, QUrl's
> constructor uses the same flawed fully-decoded notation.

Right, glad to see we agree :-)

> In my view, QUrl should be modified to use *only* partially-decoded
> components and provide a method (toEncoded()) that returns the
> fully-encoded form for proper network transfer. The partially-decoded form
> would decode %-encodings that are UTF-8 sequences, including %20 to space,
> but not including delimiter characterrs (so it won't decode %3F to a
> question mark in a path component, but it would decode it in the query and
> fragment component).

OK. That seems to match the "pretty URL" notion from KDE - something that is a 
bit better human readable, but which is still fully correct and can be parsed 
back as a correct URL again.

-- 
David Faure, faure at kde.org, http://www.davidfaure.fr
Sponsored by Nokia to work on KDE, incl. Konqueror (http://www.konqueror.org).


More information about the Qt5-feedback mailing list