[Development] Regular expression libraries for QRegExp

lars.knoll at nokia.com lars.knoll at nokia.com
Mon Nov 21 22:25:07 CET 2011


On 11/21/11 6:29 PM, "ext Thiago Macieira" <thiago at kde.org> wrote:

>On Monday, 21 de November de 2011 15.45.49, Giuseppe D'Angelo wrote:
>> But first: do we all (esp. Thiago, Lars) agree to use the UTF-8
>> version for now (and pay for the pattern/subject string/offsets
>> conversions) and then write and enable a UTF-16 codepath when PCRE
>> ships with proper support for it (by detecting its version at
>> runtime)?
>
>Yes.

And yes.
>
>Also note that it might be easier to convert to UTF-8, execute the RE and
>then 
>scan forward the UTF-16 string by counting the number of UTF-8 bytes per
>character and map the offsets that we *do* need (match offset and
>captures).
>
>> Also: what's the minimum PCRE version Qt should require? I see that
>> Debian 6 (stable) uses 8.02 [1], Ubuntu 10.04 LTS uses 7.8 [2]. For
>> other distributions of course YMMV. Is it OK to depend on even more
>> recent versions? For instance, PCRE 8.10 adds UCP support (basically
>> make \w \d etc. match the corresponding Unicode properties), and PCRE
>> 8.20 adds a JIT feature (which promises large perfomance benefits) [3]
>> [4].
>> Again: should we resort to depend on a "old" version, detect the
>> proper one at runtime, and optionally enabling those features?
>
>I don't know. We should choose the features we want and then require
>that. 
>Unicode matching sounds interesting.

As does the JIT. Do you have an idea on how much bigger PCRE gets by these
features?

>
>> About the API itself: would you like more three classes (raw pattern
>> -> compiled pattern -> result of a match), or only two (pattern ->
>> result of a match)?
>
>Two sounds better. I don't see the point in having a distinction between
>a raw 
>and a compiled pattern. We might just need a pattern class and simply
>have a 
>method to compile it.

Agree with Thiago.

One interesting piece will be how we continue with this. We will most
likely need one regexp engine in QtCore (as regexp's are being used
internally in Qt in a couple of places). If we move QRegExp out, we still
need to consider how to keep source compatibility. For inline code we can
use some template magic to solve this, but it's a bit harder for stuff
living in .cpp files (e.g. QDir and QDirIterator).

Cheers,
Lars










More information about the Development mailing list