[Development] A bug or not a bug, that's the question

Kurt Pattyn pattyn.kurt at gmail.com
Mon Aug 26 13:53:38 CEST 2013


Hi,

when implementing web sockets, I encountered a problem with the QTextCodec class.
This is a code snippet:

QTextCodec *codec = QTextCodec::codecForName("UTF-8")
codec->toUnicode(someUtf8StringContainingNonCharacters, …);

When toUnicode is called with a string containing Unicode non-character codes, QTextCodec returns a conversion error.
This is expected behaviour from the QTextCodec class, as non-character code input is explicitly tested in the unit tests, and are supposed to fail.

But, non-character codes are valid in Unicode, and should be maintained as is; Unicode published a corrigendum clarifying the handling of non-characters: http://www.unicode.org/versions/corrigendum9.html.
Of course, non-character codes are meant for internal use only, and don't have a 'standard' meaning; they are application dependent.
Also, displaying a non-character code doesn't make sense (as they are not meant to be displayed).

Because I am using QTextCodec in my QWebSockets implementation, I encounter the same problem (tests from Autobahn specifically checking on the acceptance of non-character codes all fail). I really don't have a problem with that (see Rationale at http://kurtpattyn.github.io/QWebSockets/), as text is just text in my opinion. If one want to exchange special characters in a text, I recommend using a binary format for that.

But, it makes the QTextCodec non-Unicode compliant.

So, my question is now: should we consider this as a bug, and thus file a bug request in Jira, or can we live with it?
Note that solving this issue could have an effect an QString as well, as it needs to handle those non-characters. Maybe, a flag could be added to QTextCodec to indicate the handling of those characters?

Kurt


More information about the Development mailing list