ansaurus

Question

How to evaluate a matched number later in a regex? - Lexing FORTRAN 'H' edit descriptor with Ply

Answer 1

+2 A:

Regex can't do things like that. You can hack it though:

(1[Hh].|2[Hh]..|3[Hh]...|etc...)

Ugly!

Mark Byers 2010-02-07 13:17:28

Crude and limited, but effective. Good idea for a one off.

dmckee 2010-02-07 16:37:50

Answer 2

A:

This is what comes of thinking that regexps can replace a lexer.

Short version: regular expressions can only deal with that small subset of all possible language termed "regular" (big surprise, I know). But "regular" is not isomorphic to the human understanding of "simple", so even very simple languages can have non-regular expressions.

Writing a lexer for a simple language is not terribly hard.

That canonical Stack Overflow question for resources on the topic is Learning to write a compiler.

Ah. I seem to have misunderstood the question. Mea Culpa.

I'm not familiar with ply, and its been a while since I used flex, but think you would eat any number of following digits, then check in the associated code block if the rules had been obeyed.

dmckee 2010-02-07 16:36:51

Ply is a Python library that implements lex and yacc style rules within Python for creating a Lexer/Parser. I was under the impression that using lex/yacc will save me a lot of tedious coding when writing parsers

Brendan 2010-02-07 16:48:42

Answer 3

A:

Pyparsing includes an adaptive expression that is very similar to this, called countedArray. countedArray(expr) parses a leading integer 'n' and then parses 'n' instances of expr, returning the whole array as a single list. The way this works is that countedArray parses a leading integer expression, followed by an uninitialized Forward expression. The leading integer expression has a parse action attached that assigns the following Forward to 'n'*expr. The pyparsing parser then continues on, and parses the following 'n' expr's. So it is sort of a self-modifying parser.

To parse your expression, this would look something like:

integer = Word(nums).setParseAction(lambda t:int(t[0]))
following = Forward()
integer.addParseAction(lambda t: following << Word(printables+" ",exact=t[0]))
H_expr = integer + 'H' + following
print H_expr.parseString("22HThis is a test string.This is not in the string")

Prints:

[22, 'H', 'This is a test string.']

If Ply has something similar, perhaps you could use this technique.

Paul McGuire 2010-02-08 03:50:31

Thanks, that's useful to know. I had considered Pyparsing earlier but decided to go with the more UNIX'ey old school lex/yacc Ply way - and now the parser is all but written save this last detail!

Brendan 2010-02-08 10:21:19

ansaurus

tags:

views:

answers:

How to evaluate a matched number later in a regex? - Lexing FORTRAN 'H' edit descriptor with Ply

related questions