[Qt-interest] QRegExp for document analysing
Omar AKHAM
crtx.omar at gmail.com
Mon Dec 6 13:59:03 CET 2010
Okay, thank you...So how can I know when to use a regex and when to use
a normal parser ?
Omar.
On 06/12/10 09:47, Diego Iastrubni wrote:
> Parse it by reading lines and using a "normal" parser. I don't think
> that regex is the best tool for this job.
>
> On Sun, Dec 5, 2010 at 7:38 PM, Omar AKHAM <crtx.omar at gmail.com
> <mailto:crtx.omar at gmail.com>> wrote:
>
> Hi,
>
> I'm a new Python/Qt4 (PyQt4) developer, and I have some basic
> knowledge about "Regex". I have to treat some text to extract a
> precise information from it and I need some help :).
>
> So, I'm experimenting on Cacm Collection. I have a file called
> "cacm.all" which contains a documents (3204) description as
> following :
>
> .
> .
> .
> .I 20
> .T
> Accelerating Convergence of Iterative Processes
> .W
> A technique is discussed which, when applied
> to an iterative procedure for the solution of
> an equation, accelerates the rate of convergence if
> the iteration converges and induces convergence if
> the iteration diverges. An illustrative example is given.
> .B
> CACM June, 1958
> .A
> Wegstein, J. H.
> .N
> CA580602 JB March 22, 1978 9:09 PM
> .X
> 20 5 20
> 20 5 20
> 20 5 20
> .I 21
> .T
> Algebraic Formulation of Flow Diagrams
> .B
> CACM June, 1958
> .A
> Voorhees, E. A.
> .N
> CA580601 JB March 22, 1978 9:10 PM
> .X
> 21 5 21
> 21 5 21
> 21 5 21
> 679 5 21
> 21 6 21
> 407 6 21
> 3184 6 21
> .
> .
> .
>
>
> Each document description start with ".I DOC_NUM" and contains
> some descriptif sections like Title ".T", Summary ".W" and so on.
> My experiment consist to extract document Number, its title and
> its summary and ignore other sections into a list in order to
> INDEX them. I tried with a Python "re" just like that :
>
> cacmCollection = open('cacm.all', 'rU').read()
> regex = r'(\.[I]\s+\d+)\n(?:(\.(?:T|W)))+?'
> docs = re.findall(regex, self.cacmCollection)
>
> I'm waiting for tuples as :
> [('.I 20', '.T','Accelerating Convergence of Iterative
> Processes','.W','A technique is discussed which, when applied\nto
> an iterative procedure for the solution of\nan equation,
> accelerates the rate of convergence if\nthe iteration converges
> and induces convergence if\nthe iteration diverges. An
> illustrative example is given.')
> ,('.I 21', '.T','Algebraic Formulation of Flow Diagrams'),......]
>
> And I haven't what I'm waiting for...
> Can anyone correct me ?? or show me another technique to do that
> (without using a loop iteration with comparisons) [with both "re"
> python module or "QRegExp"]
>
> Thanks
> Omar
>
> _______________________________________________
> Qt-interest mailing list
> Qt-interest at trolltech.com <mailto:Qt-interest at trolltech.com>
> http://lists.trolltech.com/mailman/listinfo/qt-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20101206/473c4fc4/attachment.html
More information about the Qt-interest-old
mailing list