[Development] Deprecating QFile::encodeName/decodeName

João Abecasis joao.abecasis at nokia.com
Tue Jun 5 13:20:36 CEST 2012


Hi,

I know I'm a bit late to this party, apologies for the disruption at
this point in the game. (There's a TL;DR/Summary at the end)

Thiago Macieira wrote:
> I'd like to deprecate those two functions, as well as the setters
> QFile::setEncodingFunction and setDecodingFunction. I'd like to go
> further and make the setters no-op, and the actual functions be just
> an inline wrapper for QString::to/fromLocal8Bit. However, I'll leave
> the encode/decodeName functions as part of the Qt 5 API.

If they're no-op and we have no intention of reviving them, then this is
the right time to completely remove them. Carrying around this cruft
for no good reason doesn't help anyone.

Fwiw, I support dropping those functions. Still, I think your rationale
is wrong. Let's go over it.

> Rationale:
> 
> Those two functions have been present in Qt since at least Qt 2 (see
> [1] and [2]). Their purpose was to convert the UTF-16-based QString
> filenames to the local filesystem encodings on Unix systems. Back in
> those days, some people had file names encoded with a different
> encoding than the locale -- for example, UTF-8 for the file names and
> Latin 1 or 15 for the data, or vice versa.
> 
> For that reason, Qt developers should use QFile::encodeName when
> converting a QString for use in the POSIX C function calls, and
> similarly use QFile::decodeName when converting data from the POSIX
> calls to QString.
> 
> That concept is broken.

I would not go as far as saying that the concept is broken, but it
definitely isn't portable. One big issue is that nowadays we don't use
an 8-bit encoding on Windows and those functions assume QString <->
QByteArray conversions.

> The POSIX calls are not the only source of strings. There are more,
> like for example the command-line and data found in files. When
> parsing the command- line, QCoreApplication applies
> QString::fromLocal8Bit indiscriminately, since it doesn't know which
> arguments refer to files and which ones don't. If you're reading from
> a file, you're likely to do the same or to use QTextStream, which
> amounts to the same problem. 
> 
> Moreover, if you're going to print a file name to a file, what do you
> use? When communicating with other programs, what encoding should be
> used? How about reporting information on the terminal (stdout and
> stderr)? Similar to QCoreApplication, when you set file names as part
> of the arguments in QProcess indiscriminately applies toLocal8Bit,
> since it doesn't know what is a file name and what isn't.

Here, you are mixing different things: paths on the filesystem, command
line arguments and configuration settings. In the end it's all about
strings and each must be interpreted in the context it lives in and not
what it represents. I find it helpful to think of these things in terms
of both multi-user systems and users that switch locales (I sometimes do
this, myself).


File system paths are meant to survive reboots, and changes of user and
locale. As such, they should follow a (system-wide) predefined encoding.
Typically, they're no more than a null-terminated sequence of bytes
(2-bytes on Windows), where a reserved few are actually meaningful (say,
'/', ':' and '\0').

Configuration settings will typically belong to an application and may
or may not be user specific. They're at least meant to survive
application restarts. The responsibility for defining encoding for these
lies with the application that owns them. System-wide settings used by
multiple applications typically follow some standard or de facto
convention for encodings.

Arguments specified on the command line are short-lived. They belong to
the current session and should be interpreted according to the current
locale. Applications should assume the user typed those straight from
the shell.

[ If you save the command line in a configuration (say .desktop) file,
then your shell should read it according to the rules of such file and
convert it to current locale before invoking execv.

If you further have a file system path that you want to pass as a
command line argument and decide to save that in a configuration file...
well, you do the math ;-) ]


And now for the problem encode/decodeName pretends to solve...

Whatever the encoding that's being used for command line arguments and
configuration settings, applications should support an escaping or
encoding mechanism to allow access file system paths that can't be
directly represented in their own encodings. Since we don't provide one
ourselves we need to enable applications to do it.

The other way around also holds. When iterating the filesystem we may
come across file names that we don't know how to convert to strings.
What happens then is that as soon as you grab a QString representation
from Qt it'll be useless. We don't offer reversible conversion, unless
the local 8-bit encoding happens to support it.


> Therefore, I call this concept broken. There's only one possible
> encoding for the file system and it's the locale's encoding.

No, the concept is not broken. The implementation we had failed to
adequately support the cross-platform use case and, in a way, missed the
point. It didn't support Unicode-Windows, I don't think it was being
used on Mac and elsewhere there were spots that didn't consistently use
the API (I hope we fixed most of those in the 4.8 series, though).

> I will make the change and submit for approval. I will not wait for
> the discussion on the mailing list because, quite frankly, I do not
> expect anyone to disagree. 
> 
> If you *do* disagree, please speak up and make sure you have very
> convincing arguments on why we should keep this functionality and how
> developers should address the problems I described above. If the
> community agrees to leaving them, we can revert my patches.

As I said, I'm ok with dropping the functionality as it exists now, but
also be aware that this is removing functionality we currently don't
support in any other way.

Longer term, I think part of the solution is to expose the internal
QFileSystemEntry API and to primarily use that for passing paths around.
It might be nice to offer a default escaping and encoding mechanism to
reversibly express arbitrary byte sequences in plain (ASCII) text. That
would remove the need for users to come up with their own solutions to a
real problem.


TL;DR / Summary

Thiago's patch removes the possibility from Qt to (reversibly)
manipulate "funny" file system paths. I don't see a need to revert the
patch as the offer had limitations of its own, but we need to be
conscious of the regression in functionality.

Looking forward we need to figure out how to properly close the feature
gap in a way that supports user needs.

Cheers,


João





More information about the Development mailing list