[Development] Deprecating QFile::encodeName/decodeName

Oswald Buddenhagen oswald.buddenhagen at nokia.com
Wed Jun 6 21:21:28 CEST 2012


On Wed, Jun 06, 2012 at 04:51:14PM +0200, ext João Abecasis wrote:
> Thiago Macieira wrote:
> > So you're asking that filenames be passed on the locale encoding (say, UTF-8) 
> > on the command-line, regardless of what the filesystem encoding is?
> 
> I see no other sane way, unless your application is able to take the byte sequences it gets without additional processing.
>
thiago's whole point is that most command line apps just assume that
they can do that.
but you know what? it doesn't matter. apps which assume 8-bit
pass-through are simply not suited for fs-encoding != locale, because
the user will have all kinds of problems with that anyway (starting with
the command line if 8-bit passthrough is simply impossible, as it is
with a terminal in utf-8 mode). no valid use case. period.

> > In fact, there is one more possible solution which stands a chance: forcing 
> > the problem onto the kernel. Make the entire userspace API be UTF-8 and have 
> > the kernel recode to the filesystem encoding as necessary. The problem with 
> > this solution is that it a) will suffer extreme resistance from kernel 
> > developers and other people who think of file names as "binary data" instead of 
> > human-readable text; and b) is no different from the other solution of 
> > enforcing the encoding.
> 
> Forcing this onto the Linux kernel would in the long term make the
> situation better for Linux users that don't receive files from any
> other OSs or kernel versions. It doesn't help everyone.
> 
every fs used by windows in the last 1.5 decades (vfat, ntfs, isofs with
joliet, udf) is utf-16 based. so anything you get out of the kernel is
inherently recoded already, and the fs drivers have respective mount
options (though there is no standardization of any kind). i.e., the
problem simply does not exist for usb sticks and similar, provided udev
& co. correctly feed the kernel with the locale when mounting media.
some of the 8-bit fs drivers also have recoding options, but notably
they are missing from the linux-native fses. it shouldn't be too hard to
add some generic 8-bit-to-8-bit recoding option, but i fear nobody may
care at this point.

for the places where the problem does exists for whatever reason, the
fs-encoding is therefore mountpoint-specific (where the "mountpoint" can
also exist in user space when we are talking about a virtual file
system, like an archive). have fun solving *that* inside qt ...

> What is a real problem in practice and the one that (in my mind)
> setEncodingFunction addresses is not so much that of switching
> encodings, but that of allowing an escaping mechanism to be plugged
> in. Done this way, such escaping would be not only Qt-specific, but
> potentially application specific. Still, it should enable a simple
> File Manager built on Qt to operate on all files it sees.  
> 
yes, this is what follows from the above.


On Wed, Jun 06, 2012 at 05:36:30PM +0200, ext Thiago Macieira wrote:
> On quarta-feira, 6 de junho de 2012 16.51.14, João Abecasis wrote:
> > We could use some magic sequence. Windows, for instance, uses the "\\?\"
> > prefix to support longer paths. We could use '<' and '>', which are rare
> > but valid, we could give a specific meaning to sequences of 3 or more
> > slashes.
> > 
> > I don't have a concrete solution at the moment.
> 
> I really think we should not use a character that is easily used on file names, 
> and that includes <, >, commas, percents, backslashes, spaces, etc. It needs 
> to be a Unicode character that has a close-to-zero chance of being 
> intentionally used.
> 
the problem is that this has a lower chance of surviving various
round-trips - something 7-bit-clean would be better.
that of course means using a trigger sequence which has almost-zero
chance to occur otherwise, say "@--" (no special chars of any shell, no
path separators of any os). to keep the thing halfways readable, do the
escaping segment wise, and use url-encoding for the escaped segments.

> Anyway, what I recommend for now:
> 
>  1) immediately, de-inline QFile::decodeName and QFile::encodeName
>  2) un-deprecate them and update the text in changes-5.0.0
>
well, why not.

>  3) make QProcess use QFile::encodeName for its arguments (no-op right now)
>  4) make QCoreApplication parse its arguments using QFile::decodeName (no-op 
> right now)
>  5) idem for Laszlo's command-line parser class
> 
no. see first paragraph. doing this would only increase the mess.

> Later, we can decide whether to add escaping to those functions.
> 
> However, I cannot agree with bringing the setter functions back. I do agree 
> with removing them completely, though.
> 
ack

.



More information about the Development mailing list