[Qt-interest] Parsing input and performance of QRegExp
Atlant Schmidt
aschmidt at dekaresearch.com
Tue Jul 26 13:20:48 CEST 2011
Jens:
This may be my Luddite bias showing, but your problem seems simple
enough that a simple state machine and a very few string-compares will
probably solve it better than a bunch of regexps.
For example:
Start in state 0.
Is the game number always in the first line and always in the same position
(either starting at character [17] or at least after the "#" character, and always
terminated by the ":" character)? If so, then just hand-craft a parser that, in
state 0, only extracts the game number.
Move on to state 1 where you'll be looking for the player names.
Are players always a "no-whitespace" string following a line that begins
with "Seat n:"? Hand craft a parser that grabs those. When a line doesn't
start with "Seat...", move to state 2.
In state 2, collect the costs. (I'm not sure I understand what you mean
there; do you mean "all the bets"?)
Working across an array of strings already in memory, this all ought to
run in about a millisecond or so.
By the way, if you insist on using regexps, use "anchors" (e.g., "^").
And if you're sure about the white-spacing, don't use variable quantifiers.
So, for example, search for "^Seat \d*: (\w*)" will be a lot faster
than "Seat\s+\d*:\s*(\w*)" because 1) it allows the regexp engine to
discard any line that doesn't *BEGIN* with "Seat" and 2) it is a little
easier to match for one constant space than for an infinitely variable
number of spaces.
There's lots of literature about optimizing regexps.
But this problem seems simple enough that I wouldn't go there at all.
Atlant
________________________________
From: qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com [mailto:qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com] On Behalf Of Jens Saathoff
Sent: Saturday, July 23, 2011 09:13
To: qt-interest at trolltech.com
Subject: [Qt-interest] Parsing input and performance of QRegExp
Hi!
I need to parse some input. The input is a poker handhistory-file.
You can view an example here: "http://pastebin.com/rzhccyfK":http://pastebin.com/rzhccyfK
I need the following informations:
- All Playernames
- Gamenumber
- All costs (and of each player)
Let's say i have to get all information to put in a database and to examine the data.
My first try was "use regular expressions" and use boost::spirit.
I find out that regular expressions are much slower than parsing with boost::spirit. The problem with spirit is that it takes a long time to compile. Very long, but it's fast as hell!
What would you suggest? Use another parser? QLALR?
Can i parse the following with QLALR?
Input1: Player raises $1 to $2 (Name: Player, amount $2)
Input2: Player raises raises $1 to $2 (Name: Player raises, amount $2)
And...another thing? Is it fast?
What's with other parsers? Any experience with Ragel, Bison or something else?
Nex thing: Performance of QRegExp!
I did a test with the following code:
void MainWindow::on_btnTest_clicked()
{
int i = TestRegex();
qDebug() << "Dauerte: " << i << "\n";
}
int MainWindow::TestRegex()
{
list.clear();
qDebug() << "Start\n";
tgone.start();
regex.setCaseSensitivity(Qt::CaseInsensitive);
regex.setPatternSyntax(QRegExp::RegExp2);
regex.setPattern("^(\\d{2,2})\\.(\\d{2,2})\\.(\\d{4,4})$");
if(regex.isValid())
{
int i=0;
i++;
for(i=0; i<5000; i++)
{
if (regex.indexIn("12.03.2011") != -1)
{
list.append(regex.cap(0));
list.append(regex.cap(1));
list.append(regex.cap(2));
}
}
}
qDebug() << "Elemente: " << list.count() << "\n";
return tgone.elapsed();
}
If i run the code for the first time the used time is 71 ms, on second run 400, on third 501. It grows!! But why?
I know that other frameworks need to compile a regexe, why is there no need for QRegExp?
Thank you very much! Im really interested in your answers!!!
Click here<https://www.mailcontrol.com/sr/RsR3KVHfff7TndxI!oX7Uu4ItyQZZf3fKR7HU7IF0!Qnfpav4kKQ!bCV!KpIx8oXea8QmtQva5Ucloz8rNwSTQ==> to report this email as spam.
________________________________
This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.
Thank you.
Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20110726/b23a06c1/attachment.html
More information about the Qt-interest-old
mailing list