ansaurus

Question

PyParsing: Not all tokens passed to setParseAction()

Answer 1

+2 A:

Works better if you set the parse action on both course and the Optional (you were setting only on the Optional!):

>>> statement = (course + Optional(OR_CONJ + course)).setParseAction(statementParse).setDebug()
>>> statement.parseString("CS 2110 or INFO 3300")

gives

Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110, 'INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} -> ['CS', 2110, 'INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

though I suspect what you actually want is to set the parse action on each course, not on the statement:

>>> statement = course + Optional(OR_CONJ + course)
>>> statement.parseString("CS 2110 or INFO 3300")                               Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['CS', 2110]
Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 10(1,11)
string CS 2110 or INFO 3300
loc: 10 
tokens: ['INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

Alex Martelli 2010-05-31 00:29:16

Answer 2

+2 A:

In order to keep the token bits from "CS 2110" and "INFO 3300", I suggest you wrap your definition of course in a Group:

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

It also looks like you are charging head-on at parsing out some kind of search expression, like "x and y or z". There is some subtlety to this problem, and I suggest you check out some of the examples at the pyparsing wiki on how to build up these kinds of expressions. Otherwise you will end up with a bird's nest of Optional("or" + this) and ZeroOrMore( "and" + that) pieces. As a last-ditch, you may even just use something with operatorPrecedence, like:

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")        
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
course = Group(DEPT_CODE + COURSE_NUMBER)

courseSearch = operatorPrecedence(course, 
    [
    ("not", 1, opAssoc.RIGHT),
    ("and", 2, opAssoc.LEFT),
    ("or", 2, opAssoc.LEFT),
    ])

(You may have to download the latest 1.5.3 version from the SourceForge SVN for this to work.)

Paul McGuire 2010-05-31 01:02:44

Yeah, "charging head-on" is a pretty good way of describing what I'm doing. Thanks for the pointer to the example on the wiki.

Rosarch 2010-05-31 01:28:11

ansaurus

tags:

views:

answers:

PyParsing: Not all tokens passed to setParseAction()

related questions