[Qt-interest] QFtp put functions has a bug?
Thiago Macieira
thiago.macieira at trolltech.com
Wed Feb 11 18:23:50 CET 2009
Em Quarta-feira 11 Fevereiro 2009, às 16:37:18, Srdjan Todorovic escreveu:
> Doesn't that depend on your FTP server?
It does.
> Curious... so how does this relate to the rest of the Internet? Are
> URLs also i18n? What about web servers? Libc? DNS? Does the FTP
> protocol even support i18n filenames?
There's no definitive answer. Everything is different everywhere. Here's what I
can remember on the subject:
URLs: they are always in UTF-8, as defined by the IRI (Internationalised
Resource Identifier) specification: RFC 3987. It's the companion document to the
actual URI specification (RFC 3986), which in turn defines that URIs can contain
any arbitrary data (remember that URL is a subset of URI). According to the
URI spec, any character that is not unreserved must appear encoded with %HH.
However, user agents (your browser) usually decode the %HH into nicer
characters. They also parse what you type into %HH. In both cases, they have
to use UTF-8.
QUrl supports URIs and IRIs without a problem. It uses UTF-8 as would be
expected. The missing feature is the "pretty URL" format, which is a QString,
with the %HH sequences that are valid UTF-8 decoded into characters, but
leaving the non-UTF-8 sequences encoded.
Web servers: nothing special, they are just sending/receiving items based on
the path + query components of a URI. Therefore, the URI/IRI definitions take
precedence.
QHttp pays no mind to this, since it takes the path + query component
directly. You're supposed to encode it yourself. QNetworkAccessManager
operates on QUrl, so this is taken care of already.
libc and POSIX APIs don't care about the encoding. They operate on arbitrary
8-bit data. As long as it's NUL-terminated, it's fine for them. Note that
Windows extends the ANSI and POSIX APIs with wide-character variants,
operating on UTF-16 directly. Also note that Windows uses UTF-16 for the wide-
char, but uses legacy encoding (not UTF-8) for the 8-bit version. As for
MacOS, it uses UTF-8 exclusively as the locale codec, but unlike most other
uses of UTF-8, it prefers the NFD composition mode.
Qt takes care of almost all of that for you. The Qt API dealing with file names
operates exclusively on QString, meaning that it is compatible already with
Windows and it encodes as necessary to Unix systems. The problem lies only
with filenames that are arbitrary and lie outside the locale encoding. You can
create and deal with those files with the POSIX 8-bit API, but not with Qt's
QString-based API.
DNS is a separate issue. The DNS packet format is limited to the so-called LDH
characters: Letters, Digits, Hyphen. It could support more, but chances are
DNS servers along the way would drop the packets. So for DNS, there's a series
of RFCs defining IDNA: Internationalised Domain Names for Applications.
Basically, they defined a new Unicode encoding called Punycode, which is LDH-
compatible. The IDNs are prefixed with xn--, indicating that Punycode should be
used to decode.
QHostInfo supports this, of course. In fact, the support is also in QUrl, so
it's out of the box. QTcpSocket and QUdpSocket do lookups using QHostInfo, so
they support it too.
Finally, the FTP specification predates any internationalisation support. It's
just an 8-bit stream, suffering from the same problems as the POSIX APIs.
Encoding is not defined, so the only way to get the proper encoding for remote
servers is to ask the user.
--
Thiago Macieira - thiago.macieira (AT) nokia.com
Senior Product Manager - Nokia, Qt Software
Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090211/90243d1b/attachment.bin
More information about the Qt-interest-old
mailing list