views:

828

answers:

3

I've tried taking this code and converting it to something for a project I'm working on for programming language processing, but I'm running into an issue with a simplified version:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

I've played around with a number of different modifications of this simple setup. Usually, trying something like:

print(expr.parseString('1+2'))

Will return ['1']. While I get caught in deep recursion with something like:

print(expr.parseString('(1+2)'))

What am I missing with respect to simple recursion that I can't parse arbitrarily arithmetic expressions, such as 1+(2 * 3-(4*(5+6)-(7))...?

+2  A: 

A grammar like:

expr :: expr op expr

is hard to work with because the recursion just keeps diving into the left.

A normal arithmetic grammar would look something like:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

Basically, you never get S :: S; any time a nonterminal appears on the left and right hand sides of a line in the grammar, there must be some literal in the middle for the parser to consume.

John Fouhy
+3  A: 

Is this more or less what you want...?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

emitting

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? This "anchors" the recursion by separating an "atom" (number or parenthesized expression) from an "expression" (one or more "atoms" with operators in-between).

Alex Martelli
+5  A: 

Wow, I guess pyparsing is really on the map! Thanks Alex and John for stepping in on this question. You are both on the mark with your responses. But let me add a comment or two:

  1. If we suppress the opening and closing parenthesis symbols, and group the parenthesized expression using Group, pyparsing will a structured result that is closer to an AST.

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    
    def Syntax():
        op = oneOf( '+ - / *')
        lpar  = Literal( '(' ).suppress()
        rpar  = Literal( ')' ).suppress()
        num = Word(nums)
        expr = Forward()
        atom = num | Group( lpar + expr + rpar )
        expr << atom + ZeroOrMore( op + expr )
        return expr
    
    
    if __name__ == "__main__":
        expr = Syntax()
        def test(s):
            results = expr.parseString( s )
            print s,'->', results
    
    
    
    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )
    

    Giving:

    (9 + 3) -> [['9', '+', '3']]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
    

    Otherwise, pyparsing is just tokenizing, and you have to walk the list of parsed tokens to find the nested expressions.

  2. Since op is defined as just oneOf("+ - * /"), there is no precedence of operations. There are examples on the pyparsing wiki of the manual way to define this (fourFn.py), or the more recent approach using the operatorPrecedence helper (simpleArith.py). Again, this has pyparsing adding more value than just tokenizing.

To the OP, please check out those examples, I think they will help move you forward on your project.

-- Paul

Paul McGuire