[Development] QRegularExpression -- first round of API review

Giuseppe D'Angelo dangelog at gmail.com
Fri Jan 6 17:35:26 CET 2012


    Hi,

    following w00t's suggestion, I've started to publish some code on
    gerrit: <http://codereview.qt-project.org/12319> (it took more to
    refactor after removing UTF-8, rather than doing it right the first
    time... that's life).

    It's still a WIP, and many things are still missing, there are some bugs
    and FIXME comments here and there, but I'd like to gather some feedbacks
    being my first major contribution.

    In my plans I wish to wait at least till PCRE releases 8.30 (including
    UTF-16 support), so we can include a stable version in Qt and not
    upgrade it (they plan to release by the end of this month), then
    refactor other classes and remove QRegExp in a separate changeset.
    Comments?

    I trimmed down the API to a minimum usable subset, and I'd be glad if it
    could be reviewed again.

    I'm skeptical about keeping ExactMatch (which as of now is really an
    hack), being nothing more than a normal match with a slightly different
    regular expression (which could be simply provided by the user; PCRE
    doesn't support it). I also need a couple of suggestions about method
    namings and behaviours:

    *   QRegularExpressionMatch::captureCount returns actually the highest
        index of a capturing group that matched something. Ideas?
        (lastCapturedIndex?)

    *   What should QRegularExpressionMatch::subjectOffset return when one
        advances a match (f.i. by using
        QRegularExpressionMatch::operator++)? The offset at which
        "logically" the match is re-attempted (which is the ending position
        of the current match + 1) or the one at which it is REALLY
        attempted, which could be one or two charaters ahead, if the old
        match matched an empty string? (Cf. the discussion in my last mail
        about attempting /g matches against patterns that can match an empty
        string)

    *   Should endPos(n) return the offset AT the end of the n-th capturing
        group, thus enforcing the invariant "matchedLength(n) = endPos(n) -
        startPos(n) + 1" and implying that a capturing group of of length 0
        returns endPos(n) = startPos(n) - 1 (which could seem strange on a
        first look)? Or do you prefer endPos(n) to return the offset plus
        one (i.e. immediately after the end of substring captured by the
        n-th group), having then "matchedLength(n) = endPos(n) -
        startPos(n)"? (3rd option: remove endPos(n) entirely)

    Since it was requested by João, here you are some examples of API
    usage:

Does a string match a pattern?
  Version 1
     QString str("a string");
     bool matches = str.contains(QRegularExpression("\\bstring\\b"));
     // matches == true

  Version 2
     QString str("a string");
     QRegularExpression re("\\bstring\\b");
     QRegularExpressionMatch match = re.match(str);
     bool matches = match.hasMatch();
     // matches == true

Does a string exactly match a pattern?
  Version 1
     QString str("a string");
     bool matches = str.contains(QRegularExpression("\\Aa str\\w+\\z"));
     // matches == true

  Version 2
     QString str("a string");
     QRegularExpression re("a str\\w+");
     QRegularExpressionMatch match = re.match(str, 0,
QRegularExpression::ExactMatch);
     bool matches = match.hasMatch();
     // matches == true

Does a string match a pattern case insensitively?
  Version 1
     QString str("a StRiNg");
     bool matches = str.contains(QRegularExpression("\\bstring\\b",

QRegularExpression::CaseInsensitiveOption));
     // matches == true

  Version 2
     QString str("a StRiNg");
     bool matches = str.contains(QRegularExpression("(?i)\\bstring\\b"));
     // matches == true

  Version 3
     QString str("a StRiNg");
     QRegularExpression re("\\bstring\\b");
     re.setPatternOptions(QRegularExpression::CaseInsensitiveOption);
     QRegularExpressionMatch match = re.match(str);
     bool matches = match.hasMatch();
     // matches == true

What's the starting offset of the first occurrence of the pattern
inside the string?
  Version 1
     QString str("a string");
     int index = str.indexIn(QRegularExpression("\\bstring\\b"));
     // index == 2

  Version 2
     QString str("a string");
     QRegularExpression re("\\bstring\\b");
     QRegularExpressionMatch match = re.match(str);
     int index = match.startPos();
     // index == 2

How do I extract matched subexpressions?
  Version 1
     QString str("a string");
     QRegularExpression re("(\\w+) (\\w+)");
     QRegularExpressionMatch match = re.match(str);
     QString article = match.cap(1);
     QString noun = match.cap(2);
     // article == "a", noun == "string"

  Version 2
     QString str("a string");
     QRegularExpression re("(?<article>\\w+) (?<noun>\\w+)");
     QRegularExpressionMatch match = re.match(str);
     QString article = match.cap("article");
     QString noun = match.cap("noun");
     // article == "a", noun == "string"
     // cap(1) and cap(2) will work as usual

How do I check if a string contains one line that exactly matches a pattern?
     QString str("a\nstring with\nnot so many\nlines");
     QRegularExpression re("^\\w+ \\w+ \\w+$");
     re.setPatternOptions(QRegularExpression::MultilineOption);
     QRegularExpressionMatch match = re.match(str);
     bool matches = match.hasMatch();
     // matches == true, "not so many" was matched and captured as cap()

How do I extract all the substrings that match a given pattern?
     QString str("lorem ipsum dolor sit amet");
     QRegularExpression re("\\b\\w+t\\b"); // words ending in t
     QStringList substrings;
     for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
         substrings << match.cap();

How many times does a pattern occur inside a string, not counting
overlapping matches?
     QString str("mississippi");
     QRegularExpression re("issi");
     int count = 0;
     for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
         ++count;
     // count == 1

How many times does a pattern occur inside a string, counting
overlapping matches?
  Version 1
     QString str("mississippi");
     int count = str.count(QRegularExpression("issi"));
     // count == 2

  Version 2
     QString str("mississippi");
     QRegularExpression re("issi");
     int count = 0;
     int offset = -1;
     while (offset < (str.length() - 1)) {
         QRegularExpressionMatch match = re.match(str, offset + 1);
         if (!match.hasMatch())
             break;
         count++;
         offset = match.startPos();
     }
     // count == 2

  Version 3
     QString str("mississippi");
     QRegularExpression re("(?=issi)");
     int count = 0;
     for (QRegularExpressionMatch match = re.match(str);
match.hasMatch(); ++match)
         ++count;
     // count == 2

How do I check if the user input is validated by a certain pattern?
     QString input("the user is writing down this str");
     QRegularExpression re("\\bstring\\b");
     QRegularExpressionMatch match = re.match(input, 0,
QRegularExpression::PartialMatch);
     if (match.hasMatch()) {
        return QValidator::Acceptable;
     } else if (match.hasPartialMatch()) {
        // this branch is true; match.cap() == "str"
        return QValidator::Intermediate;
     } else {
        return QValidator::Invalid;
     }

Cheers,
-- 
Giuseppe D'Angelo



More information about the Development mailing list