views:

344

answers:

2

I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam) and spam+eggs, but foo(spam+eggs) (recursive descent? my terminology from compilers is a bit rusty) fails.

I have the following code:

from pyparsing_py3 import *

myVal = Word(alphas+nums+'_')    
myFunction = myVal + '(' + delimitedList( myVal ) + ')'

myExpr = Forward()
mySubExpr = ( \
    myVal \
    | (Suppress('(') + Group(myExpr) + Suppress(')')) \
    | myFunction \
    )
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )


# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))
+3  A: 

Several issues: delimitedList is looking for a comma-delimited list of myVal, i.e. identifiers, as the only acceptable form of argument list, so of course it can't match 'foo+bar' (not a comma-delimited list of myVal!); fixing that reveals another -- myVal and myFunction start the same way so their order in mySubExpr matters; fixing that reveals yet another -- TWO levels of nesting instead of one. This versions seems ok...:

myVal = Word(alphas+nums+'_')    

myExpr = Forward()
mySubExpr = (
    (Suppress('(') + Group(myExpr) + Suppress(')'))
    | myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
    | myVal
    )
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) 

print(myExpr.parseString('blah(foo+bar)'))

emits ['blah', ['foo', '+', 'bar']] as desired. I also removed the redundant backslashes, since logical line continuation occurs anyway within parentheses; they were innocuous but did hamper readability.

Alex Martelli
+3  A: 

I've found that a good habit to get into when using the '<<' operator with Forwards is to always enclose the RHS in parentheses. That is:

myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )

is better as:

myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )

This is a result of my unfortunate choice of '<<' as the "insertion" operator for inserting the expression into a Forward. The parentheses are unnecessary in this particular case, but in this one:

integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer

we see why I say "unfortunate". If I simplify this to "A << B | C", we easily see that the precedence of operations causes evaluation to be performed as "(A << B) | C", since '<<' has higher precedence than '|'. The result is that the Forward A only gets the expression B inserted in it. The "| C" part does get executed, but what happens is that you get "A | C" which creates a MatchFirst object, which is then immediately discarded since it is not assigned to any variable name. The solution would be to group the statement within parentheses as "A << (B | C)". In expressions composed only using '+' operations, there is no actual need for the parentheses, since '+' has a higher precedence than '<<'. But this is just lucky coding, and causes problem when someone later adds an alternative expression using '|' and doesn't realize the precedence implications. So I suggest just adopting the style "A << (expression)" to help avoid this confusion.

(Someday I will write pyparsing 2.0 - which will allow me to break compatibilty with existing code - and change this to use the '<<=' operator, which fixes all of these precedence issues, since '<<=' has lower precedence than any of the other operators used by pyparsing.)

Paul McGuire
Paul,That's very useful information on operator precedence that I hadn't considered. Thank you for that, and also thank you for a wonderful contribution to the Python community! I enjoyed 'Getting Started with Pyparsing', but I apparently need a bit more experience.
ash