[Development] Deprecating QFile::encodeName/decodeName

lars.knoll at nokia.com lars.knoll at nokia.com
Thu Jun 7 09:18:16 CEST 2012



On 6/6/12 9:21 PM, "ext Oswald Buddenhagen" <oswald.buddenhagen at nokia.com>
wrote:

>On Wed, Jun 06, 2012 at 04:51:14PM +0200, ext João Abecasis wrote:
>> Thiago Macieira wrote:
>> > So you're asking that filenames be passed on the locale encoding
>>(say, UTF-8) 
>> > on the command-line, regardless of what the filesystem encoding is?
>> 
>> I see no other sane way, unless your application is able to take the
>>byte sequences it gets without additional processing.
>>
>thiago's whole point is that most command line apps just assume that
>they can do that.
>but you know what? it doesn't matter. apps which assume 8-bit
>pass-through are simply not suited for fs-encoding != locale, because
>the user will have all kinds of problems with that anyway (starting with
>the command line if 8-bit passthrough is simply impossible, as it is
>with a terminal in utf-8 mode). no valid use case. period.
>
>> > In fact, there is one more possible solution which stands a chance:
>>forcing 
>> > the problem onto the kernel. Make the entire userspace API be UTF-8
>>and have 
>> > the kernel recode to the filesystem encoding as necessary. The
>>problem with 
>> > this solution is that it a) will suffer extreme resistance from
>>kernel 
>> > developers and other people who think of file names as "binary data"
>>instead of 
>> > human-readable text; and b) is no different from the other solution
>>of 
>> > enforcing the encoding.
>> 
>> Forcing this onto the Linux kernel would in the long term make the
>> situation better for Linux users that don't receive files from any
>> other OSs or kernel versions. It doesn't help everyone.
>> 
>every fs used by windows in the last 1.5 decades (vfat, ntfs, isofs with
>joliet, udf) is utf-16 based. so anything you get out of the kernel is
>inherently recoded already, and the fs drivers have respective mount
>options (though there is no standardization of any kind). i.e., the
>problem simply does not exist for usb sticks and similar, provided udev
>& co. correctly feed the kernel with the locale when mounting media.
>some of the 8-bit fs drivers also have recoding options, but notably
>they are missing from the linux-native fses. it shouldn't be too hard to
>add some generic 8-bit-to-8-bit recoding option, but i fear nobody may
>care at this point.
>
>for the places where the problem does exists for whatever reason, the
>fs-encoding is therefore mountpoint-specific (where the "mountpoint" can
>also exist in user space when we are talking about a virtual file
>system, like an archive). have fun solving *that* inside qt ...
>
>> What is a real problem in practice and the one that (in my mind)
>> setEncodingFunction addresses is not so much that of switching
>> encodings, but that of allowing an escaping mechanism to be plugged
>> in. Done this way, such escaping would be not only Qt-specific, but
>> potentially application specific. Still, it should enable a simple
>> File Manager built on Qt to operate on all files it sees.
>> 
>yes, this is what follows from the above.
>
>
>On Wed, Jun 06, 2012 at 05:36:30PM +0200, ext Thiago Macieira wrote:
>> On quarta-feira, 6 de junho de 2012 16.51.14, João Abecasis wrote:
>> > We could use some magic sequence. Windows, for instance, uses the
>>"\\?\"
>> > prefix to support longer paths. We could use '<' and '>', which are
>>rare
>> > but valid, we could give a specific meaning to sequences of 3 or more
>> > slashes.
>> > 
>> > I don't have a concrete solution at the moment.
>> 
>> I really think we should not use a character that is easily used on
>>file names, 
>> and that includes <, >, commas, percents, backslashes, spaces, etc. It
>>needs 
>> to be a Unicode character that has a close-to-zero chance of being
>> intentionally used.
>> 
>the problem is that this has a lower chance of surviving various
>round-trips - something 7-bit-clean would be better.
>that of course means using a trigger sequence which has almost-zero
>chance to occur otherwise, say "@--" (no special chars of any shell, no
>path separators of any os). to keep the thing halfways readable, do the
>escaping segment wise, and use url-encoding for the escaped segments.
>
>> Anyway, what I recommend for now:
>> 
>>  1) immediately, de-inline QFile::decodeName and QFile::encodeName
>>  2) un-deprecate them and update the text in changes-5.0.0
>>
>well, why not.

Ok, but only for the specific use case of converting arbitrary 8bit
(including invalid sequences in the locale) to a QString and back. No way
to set any decoder functions.

What do we do with this on Windows? I am almost tempted to not even offer
the methods there, as everything's utf16 anyway, so the problem doesn't
exist.

>>  3) make QProcess use QFile::encodeName for its arguments (no-op right
>>now)
>>  4) make QCoreApplication parse its arguments using QFile::decodeName
>>(no-op 
>> right now)

On Mac and Unix. On Windows we need a different solution. There we
actually get the arguments in utf16 on windows (in WinMain()). Currently
we still use toLocale8Bit() in there, and that can/will probably break
badly as local8Bit() on windows is usually not utf-8.

So qtmain_win.cpp needs some fixing anyway.

>>  5) idem for Laszlo's command-line parser class
>> 
>no. see first paragraph. doing this would only increase the mess.
>
>> Later, we can decide whether to add escaping to those functions.
>> 
>> However, I cannot agree with bringing the setter functions back. I do
>>agree 
>> with removing them completely, though.
>> 
>ack

Yes. No setters for the encode/decode functions. Either we handle this
properly in Qt (giving full roundtrip conversions for arbitrary 8bit
sequences), or not at all.

Cheers,
Lars




More information about the Development mailing list