[Qt-creator] "Editing not possible" solution

Frédéric Marchal frederic.marchal at wowtechnology.com
Wed Jan 21 16:07:45 CET 2015


2015-01-21 9:49 GMT+01:00 Ziller Eike <Eike.Ziller at theqtcompany.com>:
>
>> On Jan 20, 2015, at 11:48 PM, Pawel <pawelfaron87 at wp.pl> wrote:
>>
>> Hello,
>>
>> In my company we work in multinational teams. Many people uses strange file encodings and they don't plan to change it...
>> In QtCreator I quite often see the "Editing not possible" error. I know that file decoding is a difficult subject, but I'm sure we can do better than this.
>>
>> I've implemented simple code to help a bit. It is located in textdocument.cpp, and differs from current solution with:
>>
>> - [Current solution] In functions read() when decoding with default codec fails we return error.
>> - [My Solution] In functions read() when decoding with default codec fails I try all available codecs (QTextCodec::availableCodecs()), and if any one success I return success.
>>
>> What do you think about this?
>>
>> Full code can be reviewed here:
>> https://codereview.qt-project.org/#/c/104124/
>
> There is this bigger patch proposal https://codereview.qt-project.org/83259 which makes Qt Creator use the Mozilla universalchardet library for a better heuristic in choosing the right encoding...
> Unfortunatly that adds 10k lines of code.
> I’d very much prefer something in between your suggestion and that patch.

>From the document distributed along with the source code
(http://mxr.mozilla.org/seamonkey/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc):

"Input text is composed of words/sentences readable to readers of a
particular language.  (= The data is not gibberish.)"

I wonder how stable it is when fed with a source code mostly
containing what people call gibberish.

In particular, the second algorithm to detect the encoding is based on
character distribution. The proposed patch seems to analyze the whole
source code instead of just comments and/or literal strings. If my
assumption is correct, the character distribution would be wrong and
the detection would be unreliable.

Note that simply running 'file *.cpp" on a project directory under
Linux does report files encoded in UTF-8 and iso-8859-1. Maybe its
algorithm might be the intermediate solution?

Frederic



More information about the Qt-creator mailing list