[Development] QUrl fully-decoded path API
Thiago Macieira
thiago.macieira at intel.com
Mon May 14 18:29:23 CEST 2012
Hello
David and I have been discussing for the past week one of the consequences of
QUrl operating on encoded data only in Qt 5. There are a few use-cases where a
fully-decoded path is necessary.
== Rationale ==
(skip to proposal if you find this lengthy)
I've already had to implement the full decoding so that QUrl::toLocalFile()
would work. But the same process might be necessary for non-local files. For
example, from qnetworkaccessftpbackend.cpp:
if (operation() == QNetworkAccessManager::GetOperation) {
setCachingEnabled(true);
ftp->get(url().path(), 0, type);
} else {
ftp->put(uploadDevice, url().path(), type);
}
If the URL contained a percent-encoded character that QUrl::path() doesn't
decode, that will remain in the path and sent to the FTP server. More than
likely, it's not what was intended. The characters that the QUrl does not
decode under any circumstances are:
- control characters between 0x00 and 0x1F
- the percent character itself (0x25)
- the backspace control character (0x7F)
- high-bit byte sequences that cannot be decoded as UTF-8
Especially because of the last category, the percent sign can never be
decoded. Those arbitrary binary sequences can appear anywhere in the URL's
user info, path, query or fragment, and the code dealing with them is common.
Moreover, encoded paths are the correct way to deal with paths when dealing
with a URL's most common use: HTTP and the web.
(as a twist of fate, the HTTP backend doesn't use QUrl::path(), but
QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets both
the path and the query)
The same applies to setting the path. Often, the data comes in a decoded form
from other contexts, such as user input or an FTP directory listing. For
those, encoding is necessary, like QUrl::fromLocalFile does.
url.setPath(deslashified.replace(QLatin1Char('%'), QStringLiteral("%25")));
As David pointed out in an email to me, no one who didn't get a full URL
training will be able to write the code properly.
== Proposal 1 ==
Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on QString,
which do the necessary encoding and decoding. QUrl::fromLocalPath will instead
call that function instead of doing the work above, and QUrl::toLocalPath's
extra decoder will be moved to the new function.
The documentation will need to be updated to indicate when to use each.
== Problem 2 ==
The same problem that applies to the path can potentially apply to other
components of the URL: user name, password, fragment and query. For example,
imagine using the following random-generated password (I generated using
KeePassX):
url.setPassword("}}>b9o%kR(");
The above will trigger the tolerant-mode's corrector and will transform the
'%' into "%25". However, when trying to send the password to the server, for
example using QAuthenticator, we might make this mistake (copied from
qnetworkaccessmanager.cpp):
// if credentials are included in the url, then use them
if (!url.userName().isEmpty()
&& !url.password().isEmpty()) {
authenticator->setUser(url.userName());
authenticator->setPassword(url.password());
[by the way, this code should test if !userInfo().isEmpty(), to catch empty
passwords too]
Then we ended up setting the password to "}}>b9o%25kR(", which is very likely
to be incorrect.
== Proposal 2 ==
So instead of adding decodedPath(), decodedUserName(), decodedPassword(), etc.
and cluttering the Qt5 QUrl API like the Qt4 one was, there's a separate
proposal:
- add an option to QUrl::ComponentFormattingOptions to execute full decoding
- add a new value to QUrl::ParsingMode to indicate full decoded parsing
- modify all setters so that they take QUrl::ParsingMode too (like
QUrl::setUrl)
These new options should not be allowed in QUrl's constructor, QUrl::setUrl,
QUrl::url, toString and toEncoded, for which full decoding creates ambiguous
data (the root flaw in QUrl in Qt 4).
Pros over proposal 1:
- less API clutter
- centralised handling of the decoding and encoding
- also allows for StrictMode setting of components and error reporting
Cons over proposal 1:
- less discoverable and harder to document that the option is needed in cases
like the FTP one above.
Which one shall it be?
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120514/bcc65154/attachment.sig>
More information about the Development
mailing list