[Development] QString::fromAscii & toAscii's future

Thiago Macieira thiago.macieira at intel.com
Tue Apr 24 14:27:10 CEST 2012


Since Qt 3 days[1], QString::fromAscii has stood for "from C strings" with 
"Ascii" standing for "that which the developer writes", instead of its proper 
definition of US-ASCII, a.k.a. ANSI X3.4-1986. By default, it was Latin 1 in Qt 
3 and Qt 4.

In addition, QTextCodec::setCodecForCStrings could be used to change the 
codec. As posted by our Chinese friends a few days ago, this was useful for 
them.

We have decided to change QString's default constructor as well as the methods 
that take const char* types directly to operate on UTF-8. That's a past 
decision, so let's not re-discuss it here. I'm in the process of implementing 
it.

The question I have is: what shall we do with QString::fromAscii?

Options are:
 1) leave it as it is now -- that is, a synonym for QString::fromLatin1

 2) revert to what it was in Qt 4, that is the equivalent of 
	return codecForCStrings->fromUnicode(str, len);
   and re-add QTextCodec::setCodecForCStrings.

  3) change it so that it is *really* US-ASCII (accepting only bytes from 0 to 
   127)

  4) change it so that it matches the QString constructor. In other words, 
    make it a synonym to QString::fromUtf8.

  5) something else


I advise against option 2. My current change is implementing option 4, but 
it's only a crude change for testing. Once I implement the changeover 
properly, we can pick any behaviour we want.

My recommendation is a variant of options 3: we document that it accepts only 
US-ASCII and that it has undefined behaviour when the input isn't US-ASCII 
compliant. That way, we can make it be equal to toLatin1() and avoid having to 
write code for checking the high bit.

Consequently, toAscii() would also be toLatin1() with the same "undefined 
behaviour" notice. If someone so chooses in the future, they could make a 
faster version of toAscii() that doesn't check for non-Latin1 characters.


Final note: I am proposing we do not change the "ASCII" macros . That is, 
QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII continue doing the same thing 
they do now. We can introduce new macros if someone can come up with better 
names, but the current ones continue to work.

[1] http://doc.trolltech.com/3.3/qstring.html#fromAscii
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120424/4cbbd37e/attachment.sig>


More information about the Development mailing list