[Development] QRegularExpression -- first round of API review
Thiago Macieira
thiago.macieira at intel.com
Wed Jan 11 14:29:48 CET 2012
On Wednesday, 11 de January de 2012 12.49.27, Giuseppe D'Angelo wrote:
> Up?
> Nobody wants to discuss this?
It's above the bikeshed threshold :-)
First, you'll need to get a Nokian to import the PCRE sources. You cannot
submit them to Gerrit (not even to the commit you made) because that violates
the CLA. You're not the author. Please don't submit the PCRE code again --
just pretend it's there.
As for me, sorry I didn't review. It fell through the cracks of "weekend is
coming".
The API looks now a lot more digestible. There are still a few methods that I
will need the documentation for, as I can't guess what they are from their
name ("subject" is probably an RE term that I don't know). The API around the
captured texts may need a few more rounds of discussion. The name "cap"
appeared in Qt 3 and if we're not able to keep source compatibility with Qt 4
anyway, maybe it's time to fix it too. The iterating methods, which are the
cool thing about this API, seem to be lost. I don't see how to get the
contents of that match.
Specific questions:
> * QRegularExpressionMatch::captureCount returns actually the highest
> index of a capturing group that matched something. Ideas?
> (lastCapturedIndex?)
It seems that they are the same thing. captureCount looks fine if the other
methods also have "capture" in the name. Does this return the number of named
captures too? E.g. imagine I have two named captures in my RE and nothing
else. If they match, will that return 2?
If my RE has a capture that is optional and fails to match, how do I find out?
Imagine:
rx = /(foo)?(bar)/
rx =~ "bar"
In this case, the first capture failed to match anything. How do I know that in
the API?
> * What should QRegularExpressionMatch::subjectOffset return when one
> advances a match (f.i. by using
> QRegularExpressionMatch::operator++)? The offset at which
> "logically" the match is re-attempted (which is the ending position
> of the current match + 1) or the one at which it is REALLY
> attempted, which could be one or two charaters ahead, if the old
> match matched an empty string? (Cf. the discussion in my last mail
> about attempting /g matches against patterns that can match an empty
> string)
I don't get the question because I don't know what a subject is, so I don't
know what a subject offset is supposed to be. Still, think about the use-case:
would someone need this offset? If so, why do they need it? What do they need
it for? Hopefully, that will help you answer the question.
> * Should endPos(n) return the offset AT the end of the n-th capturing
> group, thus enforcing the invariant "matchedLength(n) = endPos(n) -
> startPos(n) + 1" and implying that a capturing group of of length 0
> returns endPos(n) = startPos(n) - 1 (which could seem strange on a
> first look)? Or do you prefer endPos(n) to return the offset plus
> one (i.e. immediately after the end of substring captured by the
> n-th group), having then "matchedLength(n) = endPos(n) -
> startPos(n)"? (3rd option: remove endPos(n) entirely)
How is this even a problem? Under which circumstances is the triad start,
length, end not holding?
endPos should be one after the last character matched, so that in all
circumstances
end = start + length
This holds for all containers, like QString, QByteArray, QVector, etc. If this
is difficult to visualise in the API, remove the "end" methods and keep only
start and length.
> Does a string exactly match a pattern?
>
> Version 1
> QString str("a string");
> bool matches = str.contains(QRegularExpression("\\Aa str\\w+\\z"));
> // matches == true
A non-initiated like me might write "^a str\\w+$". I'd expect that to work
and, by default, ^ is the beginning of the string and $ the end. Note I did
not set MultilineOption.
> for (QRegularExpressionMatch match = re.match(str);
> match.hasMatch(); ++match)
> substrings << match.cap();
This one mixes STL-style methods (operator++) with Java-style ones. Either we
do:
for (match = re.match(); match != re.end(); ++match)
or we do:
match = re.match();
while (match.hasNext()) {
/* whatever */
match.next();
}
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120111/5674d3a6/attachment.sig>
More information about the Development
mailing list