[Development] Regular expression libraries for QRegExp

Giuseppe D'Angelo dangelog at gmail.com
Fri Nov 25 04:50:37 CET 2011


On 24 November 2011 22:49, Oswald Buddenhagen
<oswald.buddenhagen at nokia.com> wrote:
> On Wed, Nov 23, 2011 at 03:35:57PM +0100, ext Thiago Macieira wrote:
>> At first thought, I'd say that the pattern class should be a regular,
>> implicitly-shared, atomic copy-on-write value class. If you call a non-const
>> method, it detaches.
>>
>> There should be no const methods that modify internal caches. Period. If you
>> compile the pattern, it's a non-const method and it detaches.
>>
> that makes lazy compilation really tough to implement ...

The problem is the API... given the fact that every method that
somehow executes the regexp has to compile it first, does an user
expect reasonably that const methods (on a value based, implicitly
shared class) like isValid(), indexIn(), exactMatch(), etc. cache the
compiled pattern or not?

Where, in the following lines, should the pattern be compiled? The
compilation result should be kept in cache at that point?

QRegExp rx("a complicated regexp");
if (rx.isValid()) {
    index1 = rx.indexIn(str1);
    index2 = rx.indexIn(str2); // will pay for another compilation of
the pattern?
}

In your opinion, which one do you prefer between:
1) the const methods never update the internal cache. If there's a
cached compiled pattern, it's used; otherwise the pattern gets
compiled locally, executed, and everything is thrown away before
returning a result to the user. A non-const compile() method detaches,
compiles and caches the pattern;
2) the const methods are allowed to update the internal cache, so if
the pattern needs to be compiled then it's also cached (=> no
compile() or similar methods whatsoever)
?

My original idea was to have add level of indirection (together with
regexp this should solve all the world's problems...!). A class holds
the pattern and the related options (/imsx flags, usage of Unicode
properties, etc.); a const compile() method returns a matcher object
(which wraps the compiled pattern), that is, an object which can be
used to perform matches, with const isValid, indexIn, exactMatch,
etc.; the result of a match is an object that tells if the match
succeeded and, if so, gives back the match start, the length, the
captured substrings, etc.. All of these three classes (pattern,
matcher, result) are value based, implicitly shared classes (the last
two probably don't even have setters).

Cheers,
-- 
Giuseppe D'Angelo



More information about the Development mailing list