[Development] QUrl fully-decoded path API

lars.knoll at nokia.com lars.knoll at nokia.com
Tue May 15 09:02:10 CEST 2012


Hi Thiago,

reading through the whole thread, I am leaning towards proposal 2. It
might be slightly more difficult to document, but has less API clutter and
is probably easier for us to maintain in the longer term.

Cheers,
Lars

On 5/14/12 6:29 PM, "ext Thiago Macieira" <thiago.macieira at intel.com>
wrote:

>Hello
>
>David and I have been discussing for the past week one of the
>consequences of 
>QUrl operating on encoded data only in Qt 5. There are a few use-cases
>where a 
>fully-decoded path is necessary.
>
>== Rationale ==
>(skip to proposal if you find this lengthy)
>
>I've already had to implement the full decoding so that
>QUrl::toLocalFile()
>would work. But the same process might be necessary for non-local files.
>For 
>example, from qnetworkaccessftpbackend.cpp:
>
>        if (operation() == QNetworkAccessManager::GetOperation) {
>            setCachingEnabled(true);
>            ftp->get(url().path(), 0, type);
>        } else {
>            ftp->put(uploadDevice, url().path(), type);
>        }
>
>If the URL contained a percent-encoded character that QUrl::path()
>doesn't 
>decode, that will remain in the path and sent to the FTP server. More
>than 
>likely, it's not what was intended. The characters that the QUrl does not
>decode under any circumstances are:
>
>	- control characters between 0x00 and 0x1F
>	- the percent character itself (0x25)
>	- the backspace control character (0x7F)
>	- high-bit byte sequences that cannot be decoded as UTF-8
>
>Especially because of the last category, the percent sign can never be
>decoded. Those arbitrary binary sequences can appear anywhere in the
>URL's 
>user info, path, query or fragment, and the code dealing with them is
>common. 
>Moreover, encoded paths are the correct way to deal with paths when
>dealing 
>with a URL's most common use: HTTP and the web.
>
>(as a twist of fate, the HTTP backend doesn't use QUrl::path(), but
>QUrl::toString(QUrl::RemoveAuthority | QUrl::RemoveFragment) so it gets
>both 
>the path and the query)
>
>The same applies to setting the path. Often, the data comes in a decoded
>form 
>from other contexts, such as user input or an FTP directory listing. For
>those, encoding is necessary, like QUrl::fromLocalFile does.
>
>    url.setPath(deslashified.replace(QLatin1Char('%'),
>QStringLiteral("%25")));
>
>As David pointed out in an email to me, no one who didn't get a full URL
>training will be able to write the code properly.
>
>== Proposal 1 ==
>
>Add QUrl::decodedPath() and QUrl::setDecodedPath(), operating on QString,
>which do the necessary encoding and decoding. QUrl::fromLocalPath will
>instead 
>call that function instead of doing the work above, and
>QUrl::toLocalPath's
>extra decoder will be moved to the new function.
>
>The documentation will need to be updated to indicate when to use each.
>
>== Problem 2 ==
>
>The same problem that applies to the path can potentially apply to other
>components of the URL: user name, password, fragment and query. For
>example, 
>imagine using the following random-generated password (I generated using
>KeePassX):
>
>	url.setPassword("}}>b9o%kR(");
>
>The above will trigger the tolerant-mode's corrector and will transform
>the 
>'%' into "%25". However, when trying to send the password to the server,
>for 
>example using QAuthenticator, we might make this mistake (copied from
>qnetworkaccessmanager.cpp):
>
>        // if credentials are included in the url, then use them
>        if (!url.userName().isEmpty()
>            && !url.password().isEmpty()) {
>            authenticator->setUser(url.userName());
>            authenticator->setPassword(url.password());
>[by the way, this code should test if !userInfo().isEmpty(), to catch
>empty 
>passwords too]
>
>Then we ended up setting the password to "}}>b9o%25kR(", which is very
>likely 
>to be incorrect.
>
>== Proposal 2 ==
>
>So instead of adding decodedPath(), decodedUserName(), decodedPassword(),
>etc. 
>and cluttering the Qt5 QUrl API like the Qt4 one was, there's a separate
>proposal:
>
> - add an option to QUrl::ComponentFormattingOptions to execute full
>decoding
> - add a new value to QUrl::ParsingMode to indicate full decoded parsing
> - modify all setters so that they take QUrl::ParsingMode too (like
>QUrl::setUrl)
>
>These new options should not be allowed in QUrl's constructor,
>QUrl::setUrl, 
>QUrl::url, toString and toEncoded, for which full decoding creates
>ambiguous 
>data (the root flaw in QUrl in Qt 4).
>
>Pros over proposal 1:
> - less API clutter
> - centralised handling of the decoding and encoding
> - also allows for StrictMode setting of components and error reporting
>
>Cons over proposal 1:
> - less discoverable and harder to document that the option is needed in
>cases 
>like the FTP one above.
>
>Which one shall it be?
>
>-- 
>Thiago Macieira - thiago.macieira (AT) intel.com
>  Software Architect - Intel Open Source Technology Center
>     Intel Sweden AB - Registration Number: 556189-6027
>     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
>_______________________________________________
>Development mailing list
>Development at qt-project.org
>http://lists.qt-project.org/mailman/listinfo/development




More information about the Development mailing list