[Development] QRegularExpression -- first round of API review

Giuseppe D'Angelo dangelog at gmail.com
Tue Jan 17 03:43:33 CET 2012


Hi,
hopefully this will be the last round :-)

The updated header is available at [1], the code itself is also almost
complete and I'm writing unit tests and documentation for it, in the
meanwhile I've also updated some simple benchmarking results [2].

I've added a full iterator class (forward only) to iterate on the
results of a global match, and a globalMatch method in
QRegularExpression that returns such an iterator. Also, I renamed the
QRegularExpressionMatch methods to be more verbose and removed the
subject() and subjectOffset() methods.

== Yet to be solved in QRegularExpression ==

1) Support for more pattern options and match options. Right now you
can see them commented in the .h (and there's no code for them), but
adding them is trivial. Any opinions for just enabling them or only
some? In particular, InvertedGreedinessOption for giving a (more or
less) direct replacement of QRegExp::setMinimal, and
AllowDuplicatedNamesOption (see the next point) for duplicated names
in named capturing groups.

2) Support for named capturing groups with duplicated names. The point
is that this may require a couple of additional accessors inside
QRegularExpressionMatch to extract all the substrings captured by a
given name.
The long story is that with default options, having a name occurring
in different named capturing groups is illegal, unless either
- the proper pattern option is set;
- the named capturing groups with the same name have the same number too*;
- the (?J) option appears in the pattern string enabling duplicated names.
* (a named capturing group still follows the "ordinary", progressive
numbering; a branch group (?|...) that resets the number in each
branch allows to have capturing groups with the same name _and_
number).

Opinions for this feature?

3) Need a couple of better names for partial matching:
- partial match preferring a complete match
- partial match returning whatever match (partial or complete) is found first

PCRE uses Soft and Hard respectively (which are quite meaningless to me).

== Yet to be solved in Qt itself ==

There are many usages of QRegExp around, a quick grep in qtbase reveals
- QString and QStringList
- QObject::findChildren
- QVariant
- QSortFilterProxyModel
- QRegExpValidator
- QTextDocument
Also, QRegExp is being used inside qmake, which is a bootstrapped tool.

Before moving QRegExp outside of qtbase a solution must be found.

I was planning to replace the various QString methods with template
versions (with the regexp class as the template argument), and leave
template specializations in the qregexp.h and qregularexpression.h
headers. Opinions?

And what about the rest?

Cheers,
-- 
Giuseppe D'Angelo

[1] https://qt.gitorious.org/~peppe/qt/peppes-qtbase/blobs/pcreregexp/src/corelib/tools/qregularexpression.h
[2] https://gitorious.org/qt-regexp-benchmarks/pages/Home



More information about the Development mailing list