[Qt5-feedback] QUrl (was: QMimeType)
Matthias Fuchs
mat69 at gmx.net
Fri Jun 10 21:00:15 CEST 2011
Am Sonntag 05 Juni 2011, 12:00:53 schrieb Thiago Macieira:
> Em Sunday, 5 de June de 2011, às 10:00:37, Ivan Čukić escreveu:
> > It currently inherits KServiceType which will be changed, and it uses
> > KUrl which largely exists due to some parsing problems in QUrl. We are
> > hoping to push the fixes to QUrl to allow us to drop KUrl in kdelibs
> > 5.
>
> KUrl doesn't do any parsing. It uses QUrl for parsing. Therefore, "parsing
> problems in QUrl" cannot be true, as it would be KUrl parsing problems too.
>
> KUrl exists mostly to keep KDE 3 KURL API compatibility.
>
> In any case, QUrl in Qt 5 requires a rewrite of its API. Not the parsing --
> that one is fine. QUrl has a completely flawed API, owed to long-time
> misunderstanding of what a URL is.
>
> URLs and URIs are "designed by committee" and are simultaneously:
> - Unicode
> - UTF-8 encoded
> - binary
>
> So the following two URLs are the same:
> http://localhost/R%C3%A9sum%C3%A9.pdf
> http://localhost/Résumé.pdf
> but the following URL is permitted too:
> http://localhost/R%E9sum%E9.pdf
>
> Note how "é" expands to %C3%A9 (URLs are Unicode UTF-8 encoded) but at the
> same time the byte 0xE9 is permitted too (non-UTF8). QString is therefore
> inadequate to represent this in fully-decoded form for the path component:
> it is "/Résumé.pdf" for the first two URLs, but what is its value for the
> third?
>
> Also note how the following two URLs are *not* the same:
> http://localhost/foo/bar
> http://localhost/foo%2Fbar
> despite the slash character being 0x2F.
>
> So again QString is inadequate to represent a component of a URL in fully-
> decoded form which is what the QUrl::path() does. At the same time,
> QUrl::encodedPath() returning a QByteArray with %-encoding is hard to use.
>
> The slash character may be a corner case, but these two are also defnitely
> not the same:
> http://localhost/foo?arg=value#anchor
> http://localhost/foo%3Farg=value%23anchor
>
> QUrl decodes the second URL properly, and QUrl::path() returns
> "/foo?arg=value#anchor", which is fine. But then if you call
> QUrl::toString(), you get the first URL, which is *not* fine, as we
> established that they are different URLs. And to top it all off, QUrl's
> constructor uses the same flawed fully-decoded notation.
>
> In my view, QUrl should be modified to use *only* partially-decoded
> components and provide a method (toEncoded()) that returns the
> fully-encoded form for proper network transfer. The partially-decoded form
> would decode %-encodings that are UTF-8 sequences, including %20 to space,
> but not including delimiter characterrs (so it won't decode %3F to a
> question mark in a path component, but it would decode it in the query and
> fragment component).
Would the planned changes also fix the problem that can appear if file names
contain non utf8 symbols and as a result can't be renamed in the ui [1] as the
internally used file name simply replaces the "incorrect" signs with the utf8
sign "�" (question mark in a rhombus).
[1] At least in KDE.
More information about the Qt5-feedback
mailing list