[Qt-interest] QFtp put functions has a bug?
Diego Schulz
dschulz at gmail.com
Thu Feb 12 20:44:40 CET 2009
On Wed, Feb 11, 2009 at 2:23 PM, Thiago Macieira
<thiago.macieira at trolltech.com> wrote:
> Em Quarta-feira 11 Fevereiro 2009, às 16:37:18, Srdjan Todorovic escreveu:
>> Doesn't that depend on your FTP server?
>
> It does.
>
>> Curious... so how does this relate to the rest of the Internet? Are
>> URLs also i18n? What about web servers? Libc? DNS? Does the FTP
>> protocol even support i18n filenames?
>
> There's no definitive answer. Everything is different everywhere. Here's what I
> can remember on the subject:
>
> URLs: they are always in UTF-8, as defined by the IRI (Internationalised
> Resource Identifier) specification: RFC 3987. It's the companion document to the
> actual URI specification (RFC 3986), which in turn defines that URIs can contain
> any arbitrary data (remember that URL is a subset of URI). According to the
> URI spec, any character that is not unreserved must appear encoded with %HH.
> However, user agents (your browser) usually decode the %HH into nicer
> characters. They also parse what you type into %HH. In both cases, they have
> to use UTF-8.
>
> QUrl supports URIs and IRIs without a problem. It uses UTF-8 as would be
> expected. The missing feature is the "pretty URL" format, which is a QString,
> with the %HH sequences that are valid UTF-8 decoded into characters, but
> leaving the non-UTF-8 sequences encoded.
>
> Web servers: nothing special, they are just sending/receiving items based on
> the path + query components of a URI. Therefore, the URI/IRI definitions take
> precedence.
>
> QHttp pays no mind to this, since it takes the path + query component
> directly. You're supposed to encode it yourself. QNetworkAccessManager
> operates on QUrl, so this is taken care of already.
>
> libc and POSIX APIs don't care about the encoding. They operate on arbitrary
> 8-bit data. As long as it's NUL-terminated, it's fine for them. Note that
> Windows extends the ANSI and POSIX APIs with wide-character variants,
> operating on UTF-16 directly. Also note that Windows uses UTF-16 for the wide-
> char, but uses legacy encoding (not UTF-8) for the 8-bit version. As for
> MacOS, it uses UTF-8 exclusively as the locale codec, but unlike most other
> uses of UTF-8, it prefers the NFD composition mode.
>
> Qt takes care of almost all of that for you. The Qt API dealing with file names
> operates exclusively on QString, meaning that it is compatible already with
> Windows and it encodes as necessary to Unix systems. The problem lies only
> with filenames that are arbitrary and lie outside the locale encoding. You can
> create and deal with those files with the POSIX 8-bit API, but not with Qt's
> QString-based API.
>
> DNS is a separate issue. The DNS packet format is limited to the so-called LDH
> characters: Letters, Digits, Hyphen. It could support more, but chances are
> DNS servers along the way would drop the packets. So for DNS, there's a series
> of RFCs defining IDNA: Internationalised Domain Names for Applications.
> Basically, they defined a new Unicode encoding called Punycode, which is LDH-
> compatible. The IDNs are prefixed with xn--, indicating that Punycode should be
> used to decode.
>
> QHostInfo supports this, of course. In fact, the support is also in QUrl, so
> it's out of the box. QTcpSocket and QUdpSocket do lookups using QHostInfo, so
> they support it too.
>
> Finally, the FTP specification predates any internationalisation support. It's
> just an 8-bit stream, suffering from the same problems as the POSIX APIs.
> Encoding is not defined, so the only way to get the proper encoding for remote
> servers is to ask the user.
>
> --
> Thiago Macieira - thiago.macieira (AT) nokia.com
> Senior Product Manager - Nokia, Qt Software
> Sandakerveien 116, NO-0402 Oslo, Norway
>
> _______________________________________________
> Qt-interest mailing list
> Qt-interest at trolltech.com
> http://lists.trolltech.com/mailman/listinfo/qt-interest
>
>
Excellent clarification! Thanks Thiago!
More information about the Qt-interest-old
mailing list