[Qt-interest] Some characters breaking QRegExp

Atlant Schmidt aschmidt at dekaresearch.com
Wed Oct 20 13:22:09 CEST 2010


Paul:

  Interestingly enough, your example works, *AS WRITTEN*, in Perl.

  Here's the output from my test:

temp > perl test.pl
Result = "Rhythm Game"
Result = "Darkside Digital Records"
Result = "Wally Lopez, Ismael Rivas (aka Riva), Marshall (aka Luigi Rocca)"
Result = "Eric Volta's Bust That Remix"

  Included below is the Perl code I executed. Note (for the skeptics?) that I've
  escaped the apostrophe only so that it doesn't end the Perl single-quoted
  initializer strings; the escaping doesn't survive into the actual string data.

  'Looks to me like you either found a bug or a limitation in the Qt regexp
  implementation.

                                    Atlant

 -=-=-=-=-=-

temp > cat test.pl
@test =
  (

    '<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Rhythm Game<!--/defang_font--></span></td>',

    '<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Darkside Digital Records<!--/defang_font--></span></td>',

  '<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Wally Lopez, Ismael Rivas (aka Riva), Marshall (aka Luigi Rocca)<!--/defang_font--></span></td>',


  '<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Eric Volta\'s Bust That Remix<!--/defang_font--></span></td>'

  );

foreach $string (@test)
  {
#   print "\"$string\"\n";
    $string =~ m/.*font-family:.+\">(.+)<!.*/o;
    print "Result = \"$1\"\n";
#   print "\n";
  }

________________________________
From: qt-interest-bounces at trolltech.com [mailto:qt-interest-bounces at trolltech.com] On Behalf Of Paul England
Sent: Wednesday, October 20, 2010 04:35
To: qt-interest at trolltech.com
Subject: [Qt-interest] Some characters breaking QRegExp

Hi.

I think I have some characters which are breaking my QRegExp.

It's a simple one to read specific tags out of a webpage.

QRegExp rx_val(  ".*font-family:.+\">(.+)<!.*" );

     while ( !page.atEnd() ) {
        if ( rx_val.indexIn( line ) != -1 ) {
               // parse;
        }
    }

Here are some examples that work:


<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Rhythm Game<!--/defang_font--></span></td>

<td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Darkside Digital Records<!--/defang_font--></span></td>



These two do not.  The only difference is the apostrophe in one, and the parenthesis in the others.



  <td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Wally Lopez, Ismael Rivas (aka Riva), Marshall (aka Luigi Rocca)<!--/defang_font--></span></td>

  <td style="margin: 2px; padding: 1px; border: 0px solid; "><!--defang_font size="1" face="Verdana" style--><span style="font-size: x-small; font-family: Verdana; ">Eric Volta's Bust That Remix<!--/defang_font--></span></td>





I've never heard of the string being searched having have characters escaped.



Click here<https://www.mailcontrol.com/sr/Kcr1y6Q1oKjTndxI!oX7Ul+sZxelw3DRZVsjuPcd1k29dUz3QvArSE5SFfSwNIvPe2biDEA!yZD5CCz92bNxfg==> to report this email as spam.

________________________________
This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20101020/285a7a33/attachment.html 


More information about the Qt-interest-old mailing list