[Development] QUrl fully-decoded path API

lars.knoll at nokia.com lars.knoll at nokia.com
Tue May 15 09:02:10 CEST 2012

Hi Thiago,

reading through the whole thread, I am leaning towards proposal 2. It
might be slightly more difficult to document, but has less API clutter and
is probably easier for us to maintain in the longer term.


On 5/14/12 6:29 PM, "ext Thiago Macieira" <thiago.macieira at intel.com>

>David and I have been discussing for the past week one of the
>consequences of 
>QUrl operating on encoded data only in Qt 5. There are a few use-cases
>where a 
>fully-decoded path is necessary.
>== Rationale ==
>(skip to proposal if you find this lengthy)
>I've already had to implement the full decoding so that
>would work. But the same process might be necessary for non-local files.
>example, from qnetworkaccessftpbackend.cpp:
>        if (operation() == QNetworkAccessManager::GetOperation) {
>            setCachingEnabled(true);
>            ftp->get(url().path(), 0, type);
>        } else {
>            ftp->put(uploadDevice, url().path(), type);
>        }
>If the URL contained a percent-encoded character that QUrl::path()
>decode, that will remain in the path and sent to the FTP server. More
>likely, it's not what was intended. The characters that the QUrl does not
>decode under any circumstances are:
>	- control characters between 0x00 and 0x1F
>	- the percent character itself (0x25)
>	- the backspace control character (0x7F)
>	- high-bit byte sequences that cannot be decoded as UTF-8
>Especially because of the last category, the percent sign can never be
>decoded. Those arbitrary binary sequences can appear anywhere in the
>user info, path, query or fragment, and the code dealing with them is
>Moreover, encoded paths are the correct way to deal with paths when
>with a URL's most common use: HTTP and the web.
>(as a twist of fate, the HTTP backend doesn't use QUrl::path(), but
>QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets
>the path and the query)
>The same applies to setting the path. Often, the data comes in a decoded
>from other contexts, such as user input or an FTP directory listing. For
>those, encoding is necessary, like QUrl::fromLocalFile does.
>    url.setPath(deslashified.replace(QLatin1Char('%'),
>As David pointed out in an email to me, no one who didn't get a full URL
>training will be able to write the code properly.
>== Proposal 1 ==
>Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on QString,
>which do the necessary encoding and decoding. QUrl::fromLocalPath will
>call that function instead of doing the work above, and
>extra decoder will be moved to the new function.
>The documentation will need to be updated to indicate when to use each.
>== Problem 2 ==
>The same problem that applies to the path can potentially apply to other
>components of the URL: user name, password, fragment and query. For
>imagine using the following random-generated password (I generated using
>	url.setPassword("}}>b9o%kR(");
>The above will trigger the tolerant-mode's corrector and will transform
>'%' into "%25". However, when trying to send the password to the server,
>example using QAuthenticator, we might make this mistake (copied from
>        // if credentials are included in the url, then use them
>        if (!url.userName().isEmpty()
>            && !url.password().isEmpty()) {
>            authenticator->setUser(url.userName());
>            authenticator->setPassword(url.password());
>[by the way, this code should test if !userInfo().isEmpty(), to catch
>passwords too]
>Then we ended up setting the password to "}}>b9o%25kR(", which is very
>to be incorrect.
>== Proposal 2 ==
>So instead of adding decodedPath(), decodedUserName(), decodedPassword(),
>and cluttering the Qt5 QUrl API like the Qt4 one was, there's a separate
> - add an option to QUrl::ComponentFormattingOptions to execute full
> - add a new value to QUrl::ParsingMode to indicate full decoded parsing
> - modify all setters so that they take QUrl::ParsingMode too (like
>These new options should not be allowed in QUrl's constructor,
>QUrl::url, toString and toEncoded, for which full decoding creates
>data (the root flaw in QUrl in Qt 4).
>Pros over proposal 1:
> - less API clutter
> - centralised handling of the decoding and encoding
> - also allows for StrictMode setting of components and error reporting
>Cons over proposal 1:
> - less discoverable and harder to document that the option is needed in
>like the FTP one above.
>Which one shall it be?
>Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel Open Source Technology Center
>     Intel Sweden AB - Registration Number: 556189-6027
>     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
>Development mailing list
>Development at qt-project.org

More information about the Development mailing list