[Qt-interest] Heuristics for determining text codec of file
suzuki toshiya
mpsuzuki at hiroshima-u.ac.jp
Tue Jan 11 07:05:25 CET 2011
Hi,
Although I've never tried to build or use, I heard that
the character encoding detection of Mozilla can be built
as a standalone module:
Very old description:
http://www.mozilla.org/projects/intl/detectorsrc.html
source code:
http://hg.mozilla.org/mozilla-central/file/3ac595ba8c43/extensions/universalchardet
If you think Mozilla's detection is sufficient for you,
please try.
Regards,
mpsuzuki
Robert Hairgrove wrote:
> I am importing data from text files in the application I am writing.
> However, users are not necessarily aware of the codec/encoding of their
> files, so I would like to try to guess it within the application (I know
> this isn't 100% reliable) but let the user override it if they do know
> what they are doing.
>
> Looking at the documentation for QTextCodec and related functions, it
> seems that most of the "convert..." functions expect the data to have a
> byte order mark (BOM). But this is usually not the case on non-Windows
> systems (don't know about Mac these days).
>
> Is there a library freely available which can take, for example, the
> first 4K bytes of text and scan it for extended or Unicode characters,
> returning the probable codec used? I'm sure this would come in handy for
> many people. :)
>
> Thank you.
More information about the Qt-interest-old
mailing list