[Development] QRegularExpression -- first round of API review
Giuseppe D'Angelo
dangelog at gmail.com
Fri Jan 6 17:35:26 CET 2012
Hi,
following w00t's suggestion, I've started to publish some code on
gerrit: <http://codereview.qt-project.org/12319> (it took more to
refactor after removing UTF-8, rather than doing it right the first
time... that's life).
It's still a WIP, and many things are still missing, there are some bugs
and FIXME comments here and there, but I'd like to gather some feedbacks
being my first major contribution.
In my plans I wish to wait at least till PCRE releases 8.30 (including
UTF-16 support), so we can include a stable version in Qt and not
upgrade it (they plan to release by the end of this month), then
refactor other classes and remove QRegExp in a separate changeset.
Comments?
I trimmed down the API to a minimum usable subset, and I'd be glad if it
could be reviewed again.
I'm skeptical about keeping ExactMatch (which as of now is really an
hack), being nothing more than a normal match with a slightly different
regular expression (which could be simply provided by the user; PCRE
doesn't support it). I also need a couple of suggestions about method
namings and behaviours:
* QRegularExpressionMatch::captureCount returns actually the highest
index of a capturing group that matched something. Ideas?
(lastCapturedIndex?)
* What should QRegularExpressionMatch::subjectOffset return when one
advances a match (f.i. by using
QRegularExpressionMatch::operator++)? The offset at which
"logically" the match is re-attempted (which is the ending position
of the current match + 1) or the one at which it is REALLY
attempted, which could be one or two charaters ahead, if the old
match matched an empty string? (Cf. the discussion in my last mail
about attempting /g matches against patterns that can match an empty
string)
* Should endPos(n) return the offset AT the end of the n-th capturing
group, thus enforcing the invariant "matchedLength(n) = endPos(n) -
startPos(n) + 1" and implying that a capturing group of of length 0
returns endPos(n) = startPos(n) - 1 (which could seem strange on a
first look)? Or do you prefer endPos(n) to return the offset plus
one (i.e. immediately after the end of substring captured by the
n-th group), having then "matchedLength(n) = endPos(n) -
startPos(n)"? (3rd option: remove endPos(n) entirely)
Since it was requested by João, here you are some examples of API
usage:
Does a string match a pattern?
Version 1
QString str("a string");
bool matches = str.contains(QRegularExpression("\\bstring\\b"));
// matches == true
Version 2
QString str("a string");
QRegularExpression re("\\bstring\\b");
QRegularExpressionMatch match = re.match(str);
bool matches = match.hasMatch();
// matches == true
Does a string exactly match a pattern?
Version 1
QString str("a string");
bool matches = str.contains(QRegularExpression("\\Aa str\\w+\\z"));
// matches == true
Version 2
QString str("a string");
QRegularExpression re("a str\\w+");
QRegularExpressionMatch match = re.match(str, 0,
QRegularExpression::ExactMatch);
bool matches = match.hasMatch();
// matches == true
Does a string match a pattern case insensitively?
Version 1
QString str("a StRiNg");
bool matches = str.contains(QRegularExpression("\\bstring\\b",
QRegularExpression::CaseInsensitiveOption));
// matches == true
Version 2
QString str("a StRiNg");
bool matches = str.contains(QRegularExpression("(?i)\\bstring\\b"));
// matches == true
Version 3
QString str("a StRiNg");
QRegularExpression re("\\bstring\\b");
re.setPatternOptions(QRegularExpression::CaseInsensitiveOption);
QRegularExpressionMatch match = re.match(str);
bool matches = match.hasMatch();
// matches == true
What's the starting offset of the first occurrence of the pattern
inside the string?
Version 1
QString str("a string");
int index = str.indexIn(QRegularExpression("\\bstring\\b"));
// index == 2
Version 2
QString str("a string");
QRegularExpression re("\\bstring\\b");
QRegularExpressionMatch match = re.match(str);
int index = match.startPos();
// index == 2
How do I extract matched subexpressions?
Version 1
QString str("a string");
QRegularExpression re("(\\w+) (\\w+)");
QRegularExpressionMatch match = re.match(str);
QString article = match.cap(1);
QString noun = match.cap(2);
// article == "a", noun == "string"
Version 2
QString str("a string");
QRegularExpression re("(?<article>\\w+) (?<noun>\\w+)");
QRegularExpressionMatch match = re.match(str);
QString article = match.cap("article");
QString noun = match.cap("noun");
// article == "a", noun == "string"
// cap(1) and cap(2) will work as usual
How do I check if a string contains one line that exactly matches a pattern?
QString str("a\nstring with\nnot so many\nlines");
QRegularExpression re("^\\w+ \\w+ \\w+$");
re.setPatternOptions(QRegularExpression::MultilineOption);
QRegularExpressionMatch match = re.match(str);
bool matches = match.hasMatch();
// matches == true, "not so many" was matched and captured as cap()
How do I extract all the substrings that match a given pattern?
QString str("lorem ipsum dolor sit amet");
QRegularExpression re("\\b\\w+t\\b"); // words ending in t
QStringList substrings;
for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
substrings << match.cap();
How many times does a pattern occur inside a string, not counting
overlapping matches?
QString str("mississippi");
QRegularExpression re("issi");
int count = 0;
for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
++count;
// count == 1
How many times does a pattern occur inside a string, counting
overlapping matches?
Version 1
QString str("mississippi");
int count = str.count(QRegularExpression("issi"));
// count == 2
Version 2
QString str("mississippi");
QRegularExpression re("issi");
int count = 0;
int offset = -1;
while (offset < (str.length() - 1)) {
QRegularExpressionMatch match = re.match(str, offset + 1);
if (!match.hasMatch())
break;
count++;
offset = match.startPos();
}
// count == 2
Version 3
QString str("mississippi");
QRegularExpression re("(?=issi)");
int count = 0;
for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
++count;
// count == 2
How do I check if the user input is validated by a certain pattern?
QString input("the user is writing down this str");
QRegularExpression re("\\bstring\\b");
QRegularExpressionMatch match = re.match(input, 0,
QRegularExpression::PartialMatch);
if (match.hasMatch()) {
return QValidator::Acceptable;
} else if (match.hasPartialMatch()) {
// this branch is true; match.cap() == "str"
return QValidator::Intermediate;
} else {
return QValidator::Invalid;
}
Cheers,
--
Giuseppe D'Angelo
More information about the Development
mailing list