[Development] Deprecating QFile::encodeName/decodeName

Thiago Macieira thiago.macieira at intel.com
Wed Jun 6 17:36:30 CEST 2012


On quarta-feira, 6 de junho de 2012 16.51.14, João Abecasis wrote:
> > From there, we come to the conclusion that the QString representing such a
> > file  name must contain special processing instructions (e.g., one or
> > more special characters). One form of special processing instruction is
> > escaping each character, like URLs do. The problem with the approach of
> > escaping is what to do when the escape character occurs in a file name.
> > If that is a possibility, the escape character needs to be escaped by
> > itself (like "\\" for backslashes in C or "%25" for percents in URLs). If
> > we use this approach, then we will not interoperate properly with non-Qt
> > applications when this character happens.>
> > 
> >
> > The only sane solution, then, is to use a character that has a very small 
> > chance of ever being used or, better yet, a zero chance (I don't think
> > there's  any). If that happens, then this character will be close to
> > "untypeable" on the terminal. Not a big loss, I'd say.
> 
> We could use some magic sequence. Windows, for instance, uses the "\\?\"
> prefix to support longer paths. We could use '<' and '>', which are rare
> but valid, we could give a specific meaning to sequences of 3 or more
> slashes.
> 
> I don't have a concrete solution at the moment.

I really think we should not use a character that is easily used on file names, 
and that includes <, >, commas, percents, backslashes, spaces, etc. It needs 
to be a Unicode character that has a close-to-zero chance of being 
intentionally used.

I recommend selecting one or two characters from the Unicode private use area 
for this. We could use a non-character (such as U+FDD0), but that will cause 
problems elsewhere. For example, if you add such a path to QTextBrowser, it 
might do weird things. For another, such characters are dropped by the UTF-8 
encoder and decoder, aren't allowed in D-Bus, etc.

This character will be all but "untypeable" on the command-line. I don't think 
we care, though, since Qt applications are seldomly launched from the command-
line and, besides, if the user sees the broken file name anyway (in either 
form), the user is likely to fix the problem.

> > If it was named "βιογραφικό σημείωμα.txt" in ISO-8859-7, the QString 
> > representation would be:
> >       /home/foo/έγγραφα/<escape>âéïãñáöéêü óçìåßùìá.txt
> >
> > That has the drawback of being hard to use when it comes to path
> > manipulation.  Appending, prepending, extracting or inserting text could
> > have unexpected consequences.
> 
> I think any such scheme should support both absolute and relative paths and
> should allow a relative path to be combined with an absolute path with:
> 
>     absolute-path + '/' + relative-path

If you append a slash, it unshifts back to normal. But imagine someone 
appending a suffix. Thankfully, non-ASCII suffixes / extensions are really rare.

> > Limitations:
> > a) Qt-only, I don't expect anyone else to use such file names
> > b) if encodeName() isn't used properly, it leads to a bad encoding of the
> > file  name onto 8-bit. Applications dealing with the filesystem need to
> > be extra careful so as to not show two representations of the same file.
> > c) for that matter, it's possible to produce an escaped form that matches
> > a  regular file name
> > d) double representations are often a source of security issues if not 
> > handled carefully (cf. overlong sequences in UTF-8)
> 
> I don't see a) as such a big problem, since currently Qt can't even handle
> such file names. As for b) I think ideally we'd come up with something that
> makes the use of encode/decodeName invisible and doesn't require users to
> register their own encoding/decoding functions. c) is what we want to
> minimize.
> 
> As for d), if we make it all transparent and handled in a seamless way in Qt
> the problem that remains is how those paths interoperate with other
> applications and user code. It really helps to minimize c).

I'm not sure I agree with your dismissal of D. I'd like to see more research 
into this topic first.

> On the other hand we already have Qt-only paths in resource files and
> QDir::searchPaths(). We could easily use a well-known prefix for the
> special paths: url-encoded:/usr/joao/R%E9sum%E9.txt, which only supports
> absolute paths, but would already enable all items in my wish list.
> > As you can see, I didn't come up with this today. I've known these 
> > alternatives for years. I don't think they're worth our time.

Search paths and the filesystem engines are misfeatures. One is gone, the other 
not yet. They are potential security issues too.

Anyway, what I recommend for now:

 1) immediately, de-inline QFile::decodeName and QFile::encodeName
 2) un-deprecate them and update the text in changes-5.0.0
 3) make QProcess use QFile::encodeName for its arguments (no-op right now)
 4) make QCoreApplication parse its arguments using QFile::decodeName (no-op 
right now)
 5) idem for Laszlo's command-line parser class

Later, we can decide whether to add escaping to those functions.

However, I cannot agree with bringing the setter functions back. I do agree 
with removing them completely, though.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120606/acbd2431/attachment.sig>


More information about the Development mailing list