views:

27

answers:

2

I am looking for simple way to split parenthesized lists that come out of IMAP responses into Python lists or tuples. I want to go from

'(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))'

to

(BODYSTRUCTURE, ("text", "plain", ("charset", "ISO-8859-1"), None, None, "quoted-printable", 1207, 50, None, None, None, None))
A: 
Falmarri
+1  A: 

pyparsing's nestedExpr parser function parses nested parentheses by default:

from pyparsing import nestedExpr

text = '(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quotedprintable" 1207 50 NIL NIL NIL NIL))'

print nestedExpr().parseString(text)

prints:

[['BODYSTRUCTURE', ['"text"', '"plain"', ['"charset"', '"ISO-8859-1"'], 'NIL', 'NIL', '"quoted printable"', '1207', '50', 'NIL', 'NIL', 'NIL', 'NIL']]]

Here is a slightly modified parser, which does parse-time conversion of integer strings to integers, from "NIL" to None, and stripping quotes from quoted strings:

from pyparsing import (nestedExpr, Literal, Word, alphanums, 
    quotedString, replaceWith, nums, removeQuotes)

NIL = Literal("NIL").setParseAction(replaceWith(None))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
quotedString.setParseAction(removeQuotes)
content = (NIL | integer | Word(alphanums))

print nestedExpr(content=content, ignoreExpr=quotedString).parseString(text)

Prints:

[['BODYSTRUCTURE', ['text', 'plain', ['charset', 'ISO-8859-1'], None, None, 'quoted-printable', 1207, 50, None, None, None, None]]]
Paul McGuire