[Qt-interest] Parsing input and performance of QRegExp

Tue Jul 26 13:20:48 CEST 2011

Jens:

  This may be my Luddite bias showing, but your problem seems simple
  enough that a simple state machine and a very few string-compares will
  probably solve it better than a bunch of regexps.

  For example:

  Start in state 0.

  Is the game number always in the first line and always in the same position
  (either starting at character [17] or at least after the "#" character, and always
  terminated by the ":" character)? If so, then just hand-craft a parser that, in
  state 0, only extracts the game number.

  Move on to state 1 where you'll be looking for the player names.

  Are players always a "no-whitespace" string following a line that begins
  with "Seat n:"? Hand craft a parser that grabs those. When a line doesn't
  start with "Seat...", move to state 2.

  In state 2, collect the costs. (I'm not sure I understand what you mean
  there; do you mean "all the bets"?)

  Working across an array of strings already in memory, this all ought to
  run in about a millisecond or so.

  By the way, if you insist on using regexps, use "anchors" (e.g., "^").
  And if you're sure about the white-spacing, don't use variable quantifiers.
  So, for example, search for "^Seat \d*: (\w*)" will be a lot faster
  than "Seat\s+\d*:\s*(\w*)" because 1) it allows the regexp engine to
  discard any line that doesn't *BEGIN* with "Seat" and 2) it is a little
  easier to match for one constant space than for an infinitely variable
  number of spaces.

  There's lots of literature about optimizing regexps.

  But this problem seems simple enough that I wouldn't go there at all.

  Atlant

________________________________
From: qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com [mailto:qt-interest-bounces+aschmidt=dekaresearch.com at qt.nokia.com] On Behalf Of Jens Saathoff
Sent: Saturday, July 23, 2011 09:13
To: qt-interest at trolltech.com
Subject: [Qt-interest] Parsing input and performance of QRegExp

Hi!

I need to parse some input. The input is a poker handhistory-file.
You can view an example here: "http://pastebin.com/rzhccyfK":http://pastebin.com/rzhccyfK

I need the following informations:
- All Playernames
- Gamenumber
- All costs (and of each player)

Let's say i  have to get all information to put in a database and to examine the data.

My first try was "use regular expressions" and use boost::spirit.

I find out that regular expressions are much slower than parsing with boost::spirit. The problem with spirit is that it takes a long time to compile. Very long, but it's fast as hell!

What would you suggest? Use another parser? QLALR?

Can i parse the following with QLALR?
Input1: Player raises $1 to $2 (Name: Player, amount $2)
Input2: Player raises raises $1 to $2 (Name: Player raises, amount $2)

And...another thing? Is it fast?

What's with other parsers? Any experience with Ragel, Bison or something else?

Nex thing: Performance of QRegExp!

I did a test with the following code:

void MainWindow::on_btnTest_clicked()
{
    int i = TestRegex();
    qDebug() << "Dauerte: " << i << "\n";
}
int MainWindow::TestRegex()
{
    list.clear();
    qDebug() << "Start\n";
    tgone.start();
    regex.setCaseSensitivity(Qt::CaseInsensitive);
    regex.setPatternSyntax(QRegExp::RegExp2);
    regex.setPattern("^(\\d{2,2})\\.(\\d{2,2})\\.(\\d{4,4})$");

    if(regex.isValid())
    {
        int i=0;
        i++;
        for(i=0; i<5000; i++)
        {
            if (regex.indexIn("12.03.2011") != -1)
            {
                 list.append(regex.cap(0));
                 list.append(regex.cap(1));
                 list.append(regex.cap(2));
            }
        }
    }
    qDebug() << "Elemente: " << list.count() << "\n";
    return tgone.elapsed();
}

If i run the code for the first time the used time is 71 ms, on second run 400, on third 501. It grows!! But why?

I know that other frameworks need to compile a regexe, why is there no need for QRegExp?

Thank you very much! Im really interested in your answers!!!

Click here<https://www.mailcontrol.com/sr/RsR3KVHfff7TndxI!oX7Uu4ItyQZZf3fKR7HU7IF0!Qnfpav4kKQ!bCV!KpIx8oXea8QmtQva5Ucloz8rNwSTQ==> to report this email as spam.

________________________________
This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20110726/b23a06c1/attachment.html