[Development] QUrl fully-decoded path API

shane.kearns at accenture.com shane.kearns at accenture.com
Thu May 17 17:13:15 CEST 2012


I favour proposal 2.
The legacy URL schemes (ftp and file for what we implement) need to deal with fully decoded fragments.

The less discoverable con is not so much of a con as this API is mainly required to implement a protocol rather than to use it.
--


> -----Original Message-----
> From: development-bounces+shane.kearns=accenture.com at qt-project.org
> [mailto:development-bounces+shane.kearns=accenture.com at qt-project.org]
> On Behalf Of Thiago Macieira
> Sent: 14 May 2012 17:29
> To: development at qt-project.org
> Subject: [Development] QUrl fully-decoded path API
>
> Hello
>
> David and I have been discussing for the past week one of the
> consequences of QUrl operating on encoded data only in Qt 5. There are
> a few use-cases where a fully-decoded path is necessary.
>
> == Rationale ==
> (skip to proposal if you find this lengthy)
>
> I've already had to implement the full decoding so that
> QUrl::toLocalFile() would work. But the same process might be necessary
> for non-local files. For example, from qnetworkaccessftpbackend.cpp:
>
>         if (operation() == QNetworkAccessManager::GetOperation) {
>             setCachingEnabled(true);
>             ftp->get(url().path(), 0, type);
>         } else {
>             ftp->put(uploadDevice, url().path(), type);
>         }
>
> If the URL contained a percent-encoded character that QUrl::path()
> doesn't decode, that will remain in the path and sent to the FTP
> server. More than likely, it's not what was intended. The characters
> that the QUrl does not decode under any circumstances are:
>
>       - control characters between 0x00 and 0x1F
>       - the percent character itself (0x25)
>       - the backspace control character (0x7F)
>       - high-bit byte sequences that cannot be decoded as UTF-8
>
> Especially because of the last category, the percent sign can never be
> decoded. Those arbitrary binary sequences can appear anywhere in the
> URL's user info, path, query or fragment, and the code dealing with
> them is common.
> Moreover, encoded paths are the correct way to deal with paths when
> dealing with a URL's most common use: HTTP and the web.
>
> (as a twist of fate, the HTTP backend doesn't use QUrl::path(), but
> QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets
> both the path and the query)
>
> The same applies to setting the path. Often, the data comes in a
> decoded form from other contexts, such as user input or an FTP
> directory listing. For those, encoding is necessary, like
> QUrl::fromLocalFile does.
>
>     url.setPath(deslashified.replace(QLatin1Char('%'),
> QStringLiteral("%25")));
>
> As David pointed out in an email to me, no one who didn't get a full
> URL training will be able to write the code properly.
>
> == Proposal 1 ==
>
> Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on
> QString, which do the necessary encoding and decoding.
> QUrl::fromLocalPath will instead call that function instead of doing
> the work above, and QUrl::toLocalPath's extra decoder will be moved to
> the new function.
>
> The documentation will need to be updated to indicate when to use each.
>
> == Problem 2 ==
>
> The same problem that applies to the path can potentially apply to
> other components of the URL: user name, password, fragment and query.
> For example, imagine using the following random-generated password (I
> generated using
> KeePassX):
>
>       url.setPassword("}}>b9o%kR(");
>
> The above will trigger the tolerant-mode's corrector and will transform
> the '%' into "%25". However, when trying to send the password to the
> server, for example using QAuthenticator, we might make this mistake
> (copied from
> qnetworkaccessmanager.cpp):
>
>         // if credentials are included in the url, then use them
>         if (!url.userName().isEmpty()
>             && !url.password().isEmpty()) {
>             authenticator->setUser(url.userName());
>             authenticator->setPassword(url.password());
> [by the way, this code should test if !userInfo().isEmpty(), to catch
> empty passwords too]
>
> Then we ended up setting the password to "}}>b9o%25kR(", which is very
> likely to be incorrect.
>
> == Proposal 2 ==
>
> So instead of adding decodedPath(), decodedUserName(),
> decodedPassword(), etc.
> and cluttering the Qt5 QUrl API like the Qt4 one was, there's a
> separate
> proposal:
>
>  - add an option to QUrl::ComponentFormattingOptions to execute full
> decoding
>  - add a new value to QUrl::ParsingMode to indicate full decoded
> parsing
>  - modify all setters so that they take QUrl::ParsingMode too (like
> QUrl::setUrl)
>
> These new options should not be allowed in QUrl's constructor,
> QUrl::setUrl, QUrl::url, toString and toEncoded, for which full
> decoding creates ambiguous data (the root flaw in QUrl in Qt 4).
>
> Pros over proposal 1:
>  - less API clutter
>  - centralised handling of the decoding and encoding
>  - also allows for StrictMode setting of components and error reporting
>
> Cons over proposal 1:
>  - less discoverable and harder to document that the option is needed
> in cases like the FTP one above.
>
> Which one shall it be?
>
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
>   Software Architect - Intel Open Source Technology Center
>      Intel Sweden AB - Registration Number: 556189-6027
>      Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden

________________________________
Subject to local law, communications with Accenture and its affiliates including telephone calls and emails (including content), may be monitored by our systems for the purposes of security and the assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com




More information about the Development mailing list