[Development] Regular expression libraries for QRegExp

Giuseppe D'Angelo dangelog at gmail.com
Mon Nov 21 16:45:49 CET 2011


On 16 November 2011 16:08,  <marius.storm-olsen at nokia.com> wrote:
> Yes, the implementation based on UTF-8 vs UTF-16 version of PCRE would
> only differ on two lines, the UTF-16 -> UTF-8 and UTF-8 > UTF-16
> conversion before and after the matching.
>
> I suggest we get started on this with the current version of PCRE, and
> hope that entices the PCRE team to work on a proper UTF-16 implementation.
>
> Anyone interesting in jumping on this task?

I can volunteer some time :)

But first: do we all (esp. Thiago, Lars) agree to use the UTF-8
version for now (and pay for the pattern/subject string/offsets
conversions) and then write and enable a UTF-16 codepath when PCRE
ships with proper support for it (by detecting its version at
runtime)?

Also: what's the minimum PCRE version Qt should require? I see that
Debian 6 (stable) uses 8.02 [1], Ubuntu 10.04 LTS uses 7.8 [2]. For
other distributions of course YMMV. Is it OK to depend on even more
recent versions? For instance, PCRE 8.10 adds UCP support (basically
make \w \d etc. match the corresponding Unicode properties), and PCRE
8.20 adds a JIT feature (which promises large perfomance benefits) [3]
[4].
Again: should we resort to depend on a "old" version, detect the
proper one at runtime, and optionally enabling those features?

About the API itself: would you like more three classes (raw pattern
-> compiled pattern -> result of a match), or only two (pattern ->
result of a match)?
-- 
Giuseppe D'Angelo

[1] http://packages.debian.org/squeeze/libpcre3
[2] http://packages.ubuntu.com/lucid/libpcre3
[3] http://www.pcre.org/changelog.txt
[4] http://www.pcre.org/news.txt



More information about the Development mailing list