views:

60

answers:

1

Is it possible to give pyparsing a parsed list and have it return the original string?

+2  A: 

Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine combinator.

Let's say your input is:

>>> s = 'abc,def,  ghi'

Here's a parser that grabs the exact text of the list:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

To "deparse":

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

which gives you the initial input back.

But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList. Here's a version that strips whitespace:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.

Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList that conveniently strips both delimiters and whitespace for you:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})
Owen S.
I gotta say for such a vague question you really took this one and ran. Love it!
jathanism
Great answer and thanks for pitching in on the topic of pyparsing! Also check out the recently-added `originalTextFor` helper method for some similar capabilities as you describe, but which can preserve even the intervening whitespace.
Paul McGuire
Cool tip, Paul! I was looking for just such a thing but didn't find it because 1) the API documentation link on the pyparsing page is broken, and 2) the UCSC online docs I did find are probably dated. Hopefully we can get one or the other updated! I'll take a look at the latest source+doc.
Owen S.