[Qt-interest] Heuristics for determining text codec of file
Robert Hairgrove
evorgriahr at hispeed.ch
Tue Jan 11 11:21:23 CET 2011
Thank you very much! Looks promising to me.
--
On Tue, 2011-01-11 at 15:05 +0900, suzuki toshiya wrote:
> Hi,
>
> Although I've never tried to build or use, I heard that
> the character encoding detection of Mozilla can be built
> as a standalone module:
>
> Very old description:
> http://www.mozilla.org/projects/intl/detectorsrc.html
>
> source code:
> http://hg.mozilla.org/mozilla-central/file/3ac595ba8c43/extensions/universalchardet
>
> If you think Mozilla's detection is sufficient for you,
> please try.
>
> Regards,
> mpsuzuki
>
> Robert Hairgrove wrote:
> > I am importing data from text files in the application I am writing.
> > However, users are not necessarily aware of the codec/encoding of their
> > files, so I would like to try to guess it within the application (I know
> > this isn't 100% reliable) but let the user override it if they do know
> > what they are doing.
> >
> > Looking at the documentation for QTextCodec and related functions, it
> > seems that most of the "convert..." functions expect the data to have a
> > byte order mark (BOM). But this is usually not the case on non-Windows
> > systems (don't know about Mac these days).
> >
> > Is there a library freely available which can take, for example, the
> > first 4K bytes of text and scan it for extended or Unicode characters,
> > returning the probable codec used? I'm sure this would come in handy for
> > many people. :)
> >
> > Thank you.
More information about the Qt-interest-old
mailing list