[Qt-creator] "Editing not possible" solution
Ziller Eike
Eike.Ziller at theqtcompany.com
Fri Jan 23 15:27:05 CET 2015
> On Jan 21, 2015, at 4:07 PM, Frédéric Marchal <frederic.marchal at wowtechnology.com> wrote:
>
> 2015-01-21 9:49 GMT+01:00 Ziller Eike <Eike.Ziller at theqtcompany.com>:
>>
>>> On Jan 20, 2015, at 11:48 PM, Pawel <pawelfaron87 at wp.pl> wrote:
>>>
>>> Hello,
>>>
>>> In my company we work in multinational teams. Many people uses strange file encodings and they don't plan to change it...
>>> In QtCreator I quite often see the "Editing not possible" error. I know that file decoding is a difficult subject, but I'm sure we can do better than this.
>>>
>>> I've implemented simple code to help a bit. It is located in textdocument.cpp, and differs from current solution with:
>>>
>>> - [Current solution] In functions read() when decoding with default codec fails we return error.
>>> - [My Solution] In functions read() when decoding with default codec fails I try all available codecs (QTextCodec::availableCodecs()), and if any one success I return success.
>>>
>>> What do you think about this?
>>>
>>> Full code can be reviewed here:
>>> https://codereview.qt-project.org/#/c/104124/
>>
>> There is this bigger patch proposal https://codereview.qt-project.org/83259 which makes Qt Creator use the Mozilla universalchardet library for a better heuristic in choosing the right encoding...
>> Unfortunatly that adds 10k lines of code.
>> I’d very much prefer something in between your suggestion and that patch.
>
> From the document distributed along with the source code
> (http://mxr.mozilla.org/seamonkey/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc):
>
> "Input text is composed of words/sentences readable to readers of a
> particular language. (= The data is not gibberish.)"
>
> I wonder how stable it is when fed with a source code mostly
> containing what people call gibberish.
>
> In particular, the second algorithm to detect the encoding is based on
> character distribution. The proposed patch seems to analyze the whole
> source code instead of just comments and/or literal strings. If my
> assumption is correct, the character distribution would be wrong and
> the detection would be unreliable.
>
> Note that simply running 'file *.cpp" on a project directory under
> Linux does report files encoded in UTF-8 and iso-8859-1. Maybe its
> algorithm might be the intermediate solution?
Actually when I run ‘file’ on a text file that contains chinese characters in GB2312 (Simplified Chinese), then it reports
/tmp/ch.txt: ISO-8859 text
That is not very helpful either ;)
- Anything can be opened with ISO Latin 1 without decoding errors (just that the result is “gibberish”)
- Even the other way round, e.g. “©Ötzi” (which doesn’t successfully decode with UTF-8, so our warning pops up) successfully decodes in GB2312 Simplified Chinese (just that the result is (probably) “gibberish”)
So, just trying any combination of text codecs to find one that succeeds will most probably result in the wrong encoding.
On the other hand I do not want 10000 lines of code for a fancy guessing-algorithm in Qt Creator, where success is even also doubtful since we cannot assume that sensible code does not contain what that algorithm considers “gibberish”.
What I can imagine to reduce the pain in Qt Creator, is to let it remember the used encoding for a file (if it is different from the default) and use it when reopening the file.
Maybe also a “fallback” encoding setting that is put into a quick access button directly into the “cannot open with encoding” info bar (so there would be a button “select encoding” and “use XYZ”), for people that regularly have to handle one additional “funny” encoding. (default: ISO Latin 1 ?) Would that be considered helpful?
Br, Eike
> Frederic
> _______________________________________________
> Qt-creator mailing list
> Qt-creator at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/qt-creator
--
Eike Ziller, Senior Software Engineer - The Qt Company GmbH
The Qt Company GmbH, Rudower Chaussee 13, D-12489 Berlin
Geschäftsführer: Mika Pälsi, Juha Varelius, Tuula Haataja
Sitz der Gesellschaft: Berlin, Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
More information about the Qt-creator
mailing list