[Development] QRegularExpression -- first round of API review

Thiago Macieira thiago.macieira at intel.com
Wed Jan 11 14:29:48 CET 2012


On Wednesday, 11 de January de 2012 12.49.27, Giuseppe D'Angelo wrote:
> Up?
> Nobody wants to discuss this?

It's above the bikeshed threshold :-)

First, you'll need to get a Nokian to import the PCRE sources. You cannot 
submit them to Gerrit (not even to the commit you made) because that violates 
the CLA. You're not the author. Please don't submit the PCRE code again -- 
just pretend it's there.

As for me, sorry I didn't review. It fell through the cracks of "weekend is 
coming".

The API looks now a lot more digestible. There are still a few methods that I 
will need the documentation for, as I can't guess what they are from their 
name ("subject" is probably an RE term that I don't know). The API around the 
captured texts may need a few more rounds of discussion. The name "cap" 
appeared in Qt 3 and if we're not able to keep source compatibility with Qt 4 
anyway, maybe it's time to fix it too. The iterating methods, which are the 
cool thing about this API, seem to be lost. I don't see how to get the 
contents of that match.

Specific questions:

>     *   QRegularExpressionMatch::captureCount returns actually the highest
>         index of a capturing group that matched something. Ideas?
>         (lastCapturedIndex?)

It seems that they are the same thing. captureCount looks fine if the other 
methods also have "capture" in the name. Does this return the number of named 
captures too? E.g. imagine I have two named captures in my RE and nothing 
else. If they match, will that return 2?

If my RE has a capture that is optional and fails to match, how do I find out? 
Imagine:

	rx = /(foo)?(bar)/
	rx =~ "bar"

In this case, the first capture failed to match anything. How do I know that in 
the API?

>     *   What should QRegularExpressionMatch::subjectOffset return when one
>         advances a match (f.i. by using
>         QRegularExpressionMatch::operator++)? The offset at which
>         "logically" the match is re-attempted (which is the ending position
>         of the current match + 1) or the one at which it is REALLY
>         attempted, which could be one or two charaters ahead, if the old
>         match matched an empty string? (Cf. the discussion in my last mail
>         about attempting /g matches against patterns that can match an empty
>         string)

I don't get the question because I don't know what a subject is, so I don't 
know what a subject offset is supposed to be. Still, think about the use-case: 
would someone need this offset? If so, why do they need it? What do they need 
it for? Hopefully, that will help you answer the question.

>     *   Should endPos(n) return the offset AT the end of the n-th capturing
>         group, thus enforcing the invariant "matchedLength(n) = endPos(n) -
>         startPos(n) + 1" and implying that a capturing group of of length 0
>         returns endPos(n) = startPos(n) - 1 (which could seem strange on a
>         first look)? Or do you prefer endPos(n) to return the offset plus
>         one (i.e. immediately after the end of substring captured by the
>         n-th group), having then "matchedLength(n) = endPos(n) -
>         startPos(n)"? (3rd option: remove endPos(n) entirely)

How is this even a problem? Under which circumstances is the triad start, 
length, end not holding?

endPos should be one after the last character matched, so that in all 
circumstances
	end = start + length

This holds for all containers, like QString, QByteArray, QVector, etc. If this 
is difficult to visualise in the API, remove the "end" methods and keep only 
start and length.

> Does a string exactly match a pattern?
> 
>   Version 1
>      QString str("a string");
>      bool matches = str.contains(QRegularExpression("\\Aa str\\w+\\z"));
>      // matches == true

A non-initiated like me might write "^a str\\w+$". I'd expect that to work 
and, by default, ^ is the beginning of the string and $ the end. Note I did 
not set MultilineOption.

>      for (QRegularExpressionMatch match = re.match(str);
> match.hasMatch(); ++match)
>          substrings << match.cap();

This one mixes STL-style methods (operator++) with Java-style ones. Either we 
do:

	for (match = re.match(); match != re.end(); ++match)

or we do:

	match = re.match();
	while (match.hasNext()) {
		/* whatever */
		match.next();
 	}

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120111/5674d3a6/attachment.sig>


More information about the Development mailing list