[Qt-creator] "Editing not possible" solution

Sat Jan 24 19:36:06 CET 2015

Hello,

Thanks for all advices. Now I agree that it isn't worth of sacrificing  
performance for some exotic encodings.
After your advices I'm able to cope with my projects ;)

best regards
Pawel

On Fri, 23 Jan 2015 15:27:05 +0100, Ziller Eike  
<Eike.Ziller at theqtcompany.com> wrote:

>
>> On Jan 21, 2015, at 4:07 PM, Frédéric Marchal  
>> <frederic.marchal at wowtechnology.com> wrote:
>>
>> 2015-01-21 9:49 GMT+01:00 Ziller Eike <Eike.Ziller at theqtcompany.com>:
>>>
>>>> On Jan 20, 2015, at 11:48 PM, Pawel <pawelfaron87 at wp.pl> wrote:
>>>>
>>>> Hello,
>>>>
>>>> In my company we work in multinational teams. Many people uses  
>>>> strange file encodings and they don't plan to change it...
>>>> In QtCreator I quite often see the "Editing not possible" error. I  
>>>> know that file decoding is a difficult subject, but I'm sure we can  
>>>> do better than this.
>>>>
>>>> I've implemented simple code to help a bit. It is located in  
>>>> textdocument.cpp, and differs from current solution with:
>>>>
>>>> - [Current solution] In functions read() when decoding with default  
>>>> codec fails we return error.
>>>> - [My Solution] In functions read() when decoding with default codec  
>>>> fails I try all available codecs (QTextCodec::availableCodecs()), and  
>>>> if any one success I return success.
>>>>
>>>> What do you think about this?
>>>>
>>>> Full code can be reviewed here:
>>>> https://codereview.qt-project.org/#/c/104124/
>>>
>>> There is this bigger patch proposal  
>>> https://codereview.qt-project.org/83259 which makes Qt Creator use the  
>>> Mozilla universalchardet library for a better heuristic in choosing  
>>> the right encoding...
>>> Unfortunatly that adds 10k lines of code.
>>> I’d very much prefer something in between your suggestion and that  
>>> patch.
>>
>> From the document distributed along with the source code
>> (http://mxr.mozilla.org/seamonkey/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc):
>>
>> "Input text is composed of words/sentences readable to readers of a
>> particular language.  (= The data is not gibberish.)"
>>
>> I wonder how stable it is when fed with a source code mostly
>> containing what people call gibberish.
>>
>> In particular, the second algorithm to detect the encoding is based on
>> character distribution. The proposed patch seems to analyze the whole
>> source code instead of just comments and/or literal strings. If my
>> assumption is correct, the character distribution would be wrong and
>> the detection would be unreliable.
>>
>> Note that simply running 'file *.cpp" on a project directory under
>> Linux does report files encoded in UTF-8 and iso-8859-1. Maybe its
>> algorithm might be the intermediate solution?
>
> Actually when I run ‘file’ on a text file that contains chinese  
> characters in GB2312 (Simplified Chinese), then it reports
> /tmp/ch.txt: ISO-8859 text
>
> That is not very helpful either ;)
>
> - Anything can be opened with ISO Latin 1 without decoding errors (just  
> that the result is “gibberish”)
> - Even the other way round, e.g. “©Ötzi” (which doesn’t successfully  
> decode with UTF-8, so our warning pops up) successfully decodes in  
> GB2312 Simplified Chinese (just that the result is (probably)  
> “gibberish”)
>
> So, just trying any combination of text codecs to find one that succeeds  
> will most probably result in the wrong encoding.
> On the other hand I do not want 10000 lines of code for a fancy  
> guessing-algorithm in Qt Creator, where success is even also doubtful  
> since we cannot assume that sensible code does not contain what that  
> algorithm considers “gibberish”.
>
> What I can imagine to reduce the pain in Qt Creator, is to let it  
> remember the used encoding for a file (if it is different from the  
> default) and use it when reopening the file.
>
> Maybe also a “fallback” encoding setting that is put into a quick access  
> button directly into the “cannot open with encoding” info bar (so there  
> would be a button “select encoding” and “use XYZ”), for people that  
> regularly have to handle one additional “funny” encoding. (default: ISO  
> Latin 1 ?) Would that be considered helpful?
>
> Br, Eike
>
>> Frederic
>> _______________________________________________
>> Qt-creator mailing list
>> Qt-creator at qt-project.org
>> http://lists.qt-project.org/mailman/listinfo/qt-creator
>

-- 
Using Opera's mail client: http://www.opera.com/mail/

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com