[Qt-interest] Parsing input and performance of QRegExp
Jens Saathoff
jensesaat at googlemail.com
Tue Jul 26 16:32:01 CEST 2011
Hi! Thanks for your answer!
Helpful! But why does it slow down so fast and much?
I thought about it and found out that a parser-framework would be a bit
easier to maintain. The files can contain eng-language and german...and
build a regex for different languages is a bit ugly i think.
Do you have some experience with parser-frameworks?
Thank you!
2011/7/26 Atlant Schmidt <aschmidt at dekaresearch.com>
> Jens:****
>
> ** **
>
> This may be my Luddite bias showing, but your problem seems simple
> enough that a simple state machine and a very few string-compares will
> probably solve it better than a bunch of regexps.
>
> ****
>
> For example:****
>
> ** **
>
> Start in state 0.****
>
>
> Is the game number always in the first line and always in the same
> position
> (either starting at character [17] or at least after the “#” character,
> and always
> terminated by the “:” character)? If so, then just hand-craft a parser
> that, in
> state 0, only extracts the game number.
>
> Move on to state 1 where you’ll be looking for the player names.
>
> Are players always a “no-whitespace” string following a line that begins
> with “Seat n:”? Hand craft a parser that grabs those. When a line doesn’t
> start with “Seat...”, move to state 2.
>
> In state 2, collect the costs. (I’m not sure I understand what you mean
> there; do you mean “all the bets”?)****
>
> ** **
>
> Working across an array of strings already in memory, this all ought to
> run in about a millisecond or so.****
>
> ** **
>
> By the way, if you insist on using regexps, use “anchors” (e.g., “^”).
> And if you’re sure about the white-spacing, don’t use variable
> quantifiers.****
>
> So, for example, search for “^Seat \d*: (\w*)” will be a lot faster
> than “Seat\s+\d*:\s*(\w*)” because 1) it allows the regexp engine to
> discard any line that doesn’t **BEGIN** with “Seat” and 2) it is a
> little
> easier to match for one constant space than for an infinitely variable
> number of spaces.
>
> There’s lots of literature about optimizing regexps.
>
> But this problem seems simple enough that I wouldn’t go there at all.
>
> Atlant****
>
> ** **
> ------------------------------
>
> *From:* qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com[mailto:
> qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com] *On Behalf Of
> *Jens Saathoff
> *Sent:* Saturday, July 23, 2011 09:13
> *To:* qt-interest at trolltech.com
> *Subject:* [Qt-interest] Parsing input and performance of QRegExp****
>
> ** **
>
> Hi!****
>
> ** **
>
> I need to parse some input. The input is a poker handhistory-file.****
>
> You can view an example here: "http://pastebin.com/rzhccyfK":
> http://pastebin.com/rzhccyfK****
>
> ** **
>
> I need the following informations:****
>
> - All Playernames****
>
> - Gamenumber****
>
> - All costs (and of each player)****
>
> ** **
>
> Let's say i have to get all information to put in a database and to
> examine the data.****
>
> ** **
>
> My first try was "use regular expressions" and use boost::spirit.****
>
> ** **
>
> I find out that regular expressions are much slower than parsing with
> boost::spirit. The problem with spirit is that it takes a long time to
> compile. Very long, but it's fast as hell!****
>
> ** **
>
> What would you suggest? Use another parser? QLALR?****
>
> ** **
>
> Can i parse the following with QLALR?****
>
> Input1: Player raises $1 to $2 (Name: Player, amount $2)****
>
> Input2: Player raises raises $1 to $2 (Name: Player raises, amount $2)****
>
> ** **
>
> And...another thing? Is it fast?****
>
> ** **
>
> What's with other parsers? Any experience with Ragel, Bison or something
> else?****
>
> ** **
>
> ** **
>
> Nex thing: Performance of QRegExp!****
>
> ** **
>
> I did a test with the following code:****
>
> ** **
>
> void MainWindow::on_btnTest_clicked()
> {
> int i = TestRegex();
> qDebug() << "Dauerte: " << i << "\n";
> }
> int MainWindow::TestRegex()
> {
> list.clear();
> qDebug() << "Start\n";
> tgone.start();
> regex.setCaseSensitivity(Qt::CaseInsensitive);
> regex.setPatternSyntax(QRegExp::RegExp2);
> regex.setPattern("^(\\d{2,2})\\.(\\d{2,2})\\.(\\d{4,4})$");
>
> if(regex.isValid())
> {
> int i=0;
> i++;
> for(i=0; i<5000; i++)
> {
> if (regex.indexIn("12.03.2011") != -1)
> {
> list.append(regex.cap(0));
> list.append(regex.cap(1));
> list.append(regex.cap(2));
> }
> }
> }
> qDebug() << "Elemente: " << list.count() << "\n";
> return tgone.elapsed();
> }****
>
> ** **
>
> If i run the code for the first time the used time is 71 ms, on second run
> 400, on third 501. It grows!! But why?****
>
> ** **
>
> I know that other frameworks need to compile a regexe, why is there no need
> for QRegExp?****
>
> ** **
>
> Thank you very much! Im really interested in your answers!!!****
>
> ** **
>
> Click here<https://www.mailcontrol.com/sr/RsR3KVHfff7TndxI!oX7Uu4ItyQZZf3fKR7HU7IF0!Qnfpav4kKQ!bCV!KpIx8oXea8QmtQva5Ucloz8rNwSTQ==>to report this email as spam.
> ****
>
> ------------------------------
> This e-mail and the information, including any attachments, it contains are
> intended to be a confidential communication only to the person or entity to
> whom it is addressed and may contain information that is privileged. If the
> reader of this message is not the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please immediately notify the sender and destroy the
> original message.
>
> Thank you.
>
> Please consider the environment before printing this email.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20110726/18fae805/attachment.html
More information about the Qt-interest-old
mailing list