[Development] QUrl fully-decoded path API

Thiago Macieira thiago.macieira at intel.com
Mon May 14 18:29:23 CEST 2012


Hello

David and I have been discussing for the past week one of the consequences of 
QUrl operating on encoded data only in Qt 5. There are a few use-cases where a 
fully-decoded path is necessary.

== Rationale ==
(skip to proposal if you find this lengthy)

I've already had to implement the full decoding so that QUrl::toLocalFile() 
would work. But the same process might be necessary for non-local files. For 
example, from qnetworkaccessftpbackend.cpp:

        if (operation() == QNetworkAccessManager::GetOperation) {
            setCachingEnabled(true);
            ftp->get(url().path(), 0, type);
        } else {
            ftp->put(uploadDevice, url().path(), type);
        }

If the URL contained a percent-encoded character that QUrl::path() doesn't 
decode, that will remain in the path and sent to the FTP server. More than 
likely, it's not what was intended. The characters that the QUrl does not 
decode under any circumstances are:

	- control characters between 0x00 and 0x1F
	- the percent character itself (0x25)
	- the backspace control character (0x7F)
	- high-bit byte sequences that cannot be decoded as UTF-8

Especially because of the last category, the percent sign can never be 
decoded. Those arbitrary binary sequences can appear anywhere in the URL's 
user info, path, query or fragment, and the code dealing with them is common. 
Moreover, encoded paths are the correct way to deal with paths when dealing 
with a URL's most common use: HTTP and the web. 

(as a twist of fate, the HTTP backend doesn't use QUrl::path(), but 
QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets both 
the path and the query)

The same applies to setting the path. Often, the data comes in a decoded form 
from other contexts, such as user input or an FTP directory listing. For 
those, encoding is necessary, like QUrl::fromLocalFile does.

    url.setPath(deslashified.replace(QLatin1Char('%'), QStringLiteral("%25")));

As David pointed out in an email to me, no one who didn't get a full URL 
training will be able to write the code properly.

== Proposal 1 ==

Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on QString, 
which do the necessary encoding and decoding. QUrl::fromLocalPath will instead 
call that function instead of doing the work above, and QUrl::toLocalPath's 
extra decoder will be moved to the new function.

The documentation will need to be updated to indicate when to use each.

== Problem 2 ==

The same problem that applies to the path can potentially apply to other 
components of the URL: user name, password, fragment and query. For example, 
imagine using the following random-generated password (I generated using 
KeePassX):

	url.setPassword("}}>b9o%kR(");

The above will trigger the tolerant-mode's corrector and will transform the 
'%' into "%25". However, when trying to send the password to the server, for 
example using QAuthenticator, we might make this mistake (copied from 
qnetworkaccessmanager.cpp):

        // if credentials are included in the url, then use them
        if (!url.userName().isEmpty()
            && !url.password().isEmpty()) {
            authenticator->setUser(url.userName());
            authenticator->setPassword(url.password());
[by the way, this code should test if !userInfo().isEmpty(), to catch empty 
passwords too]

Then we ended up setting the password to "}}>b9o%25kR(", which is very likely 
to be incorrect.

== Proposal 2 ==

So instead of adding decodedPath(), decodedUserName(), decodedPassword(), etc. 
and cluttering the Qt5 QUrl API like the Qt4 one was, there's a separate 
proposal:

 - add an option to QUrl::ComponentFormattingOptions to execute full decoding
 - add a new value to QUrl::ParsingMode to indicate full decoded parsing
 - modify all setters so that they take QUrl::ParsingMode too (like 
QUrl::setUrl)

These new options should not be allowed in QUrl's constructor, QUrl::setUrl, 
QUrl::url, toString and toEncoded, for which full decoding creates ambiguous 
data (the root flaw in QUrl in Qt 4).

Pros over proposal 1:
 - less API clutter
 - centralised handling of the decoding and encoding
 - also allows for StrictMode setting of components and error reporting

Cons over proposal 1:
 - less discoverable and harder to document that the option is needed in cases 
like the FTP one above.

Which one shall it be?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120514/bcc65154/attachment.sig>


More information about the Development mailing list