[Development] Regular expression libraries for QRegExp

Thiago Macieira thiago at kde.org
Mon Nov 21 18:29:26 CET 2011


On Monday, 21 de November de 2011 15.45.49, Giuseppe D'Angelo wrote:
> But first: do we all (esp. Thiago, Lars) agree to use the UTF-8
> version for now (and pay for the pattern/subject string/offsets
> conversions) and then write and enable a UTF-16 codepath when PCRE
> ships with proper support for it (by detecting its version at
> runtime)?

Yes.

Also note that it might be easier to convert to UTF-8, execute the RE and then 
scan forward the UTF-16 string by counting the number of UTF-8 bytes per 
character and map the offsets that we *do* need (match offset and captures).

> Also: what's the minimum PCRE version Qt should require? I see that
> Debian 6 (stable) uses 8.02 [1], Ubuntu 10.04 LTS uses 7.8 [2]. For
> other distributions of course YMMV. Is it OK to depend on even more
> recent versions? For instance, PCRE 8.10 adds UCP support (basically
> make \w \d etc. match the corresponding Unicode properties), and PCRE
> 8.20 adds a JIT feature (which promises large perfomance benefits) [3]
> [4].
> Again: should we resort to depend on a "old" version, detect the
> proper one at runtime, and optionally enabling those features?

I don't know. We should choose the features we want and then require that. 
Unicode matching sounds interesting.

> About the API itself: would you like more three classes (raw pattern
> -> compiled pattern -> result of a match), or only two (pattern ->
> result of a match)?

Two sounds better. I don't see the point in having a distinction between a raw 
and a compiled pattern. We might just need a pattern class and simply have a 
method to compile it.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20111121/e0cc9558/attachment.sig>


More information about the Development mailing list