Is it possible to give pyparsing a parsed list and have it return the original string?
Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine
combinator.
Let's say your input is:
>>> s = 'abc,def, ghi'
Here's a parser that grabs the exact text of the list:
>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', ' ', 'ghi'], {})
To "deparse":
>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def, ghi'], {})
which gives you the initial input back.
But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList
. Here's a version that strips whitespace:
>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})
Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.
Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList
that conveniently strips both delimiters and whitespace for you:
>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})
In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})