[Development] Regular expression libraries for QRegExp

Olivier Goffart olivier at woboq.com
Fri Nov 25 08:28:22 CET 2011


On Friday 25 November 2011 03:50:37 Giuseppe D'Angelo wrote:
> On 24 November 2011 22:49, Oswald Buddenhagen
> 
> <oswald.buddenhagen at nokia.com> wrote:
> > On Wed, Nov 23, 2011 at 03:35:57PM +0100, ext Thiago Macieira wrote:
> >> At first thought, I'd say that the pattern class should be a regular,
> >> implicitly-shared, atomic copy-on-write value class. If you call a
> >> non-const method, it detaches.
> >> 
> >> There should be no const methods that modify internal caches. Period.
> >> If you compile the pattern, it's a non-const method and it detaches.
> > 
> > that makes lazy compilation really tough to implement ...
> 
> The problem is the API... given the fact that every method that
> somehow executes the regexp has to compile it first, does an user
> expect reasonably that const methods (on a value based, implicitly
> shared class) like isValid(), indexIn(), exactMatch(), etc. cache the
> compiled pattern or not?
> 
> Where, in the following lines, should the pattern be compiled? The
> compilation result should be kept in cache at that point?
> 
> QRegExp rx("a complicated regexp");
> if (rx.isValid()) {
>     index1 = rx.indexIn(str1);
>     index2 = rx.indexIn(str2); // will pay for another compilation of
> the pattern?
> }

In my opinion, the user should not even need to know that the regexp need to 
be compiled.
It is realy an internal detail on how regexp work, users do not care if there 
is a compilation each time, one time, or at compile time. But it expect Qt to 
do the right choice to make the code faster.

> In your opinion, which one do you prefer between:
> 1) the const methods never update the internal cache. If there's a
> cached compiled pattern, it's used; otherwise the pattern gets
> compiled locally, executed, and everything is thrown away before
> returning a result to the user. A non-const compile() method detaches,
> compiles and caches the pattern;
> 2) the const methods are allowed to update the internal cache, so if
> the pattern needs to be compiled then it's also cached (=> no
> compile() or similar methods whatsoever)
> ?

I'd say the const methods are allowed to update the cache.
It is perfectly fine from a sementic point of view to have mutable caches.

Of course, QRegExp has to stay reentrant. So updating the cache has to be done 
in a thread-safe way: detaching, mutex, QAtomicPointer, ... you decide.

> My original idea was to have add level of indirection (together with
> regexp this should solve all the world's problems...!). A class holds
> the pattern and the related options (/imsx flags, usage of Unicode
> properties, etc.); a const compile() method returns a matcher object
> (which wraps the compiled pattern), that is, an object which can be
> used to perform matches, with const isValid, indexIn, exactMatch,
> etc.; the result of a match is an object that tells if the match
> succeeded and, if so, gives back the match start, the length, the
> captured substrings, etc.. All of these three classes (pattern,
> matcher, result) are value based, implicitly shared classes (the last
> two probably don't even have setters).

I think this make the use of the regexp more complicated, for no benefit.
Qt aims at being simple to use.





More information about the Development mailing list