After poking around for a bit more, I came upon this help thread where there was this notable bit
I often see inefficient grammars when someone implements a pyparsing grammar directly from a BNF definition. BNF does not have a concept of "one or more" or "zero or more" or "optional"...
With that, I got the idea to change these two lines
multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))
To
multi_line_word = ZeroOrMore(split_word) + word
This got it to output what I was looking for: ['super', 'cali', fragi', 'listic']
.
Next, I added a parse action that would join these tokens together:
multi_line_word.setParseAction(lambda t: ''.join(t))
This gives a final output of ['supercalifragilistic']
.
The take home message I learned is that one doesn't simply walk into Mordor.
Just kidding.
The take home message is that one can't simply implement a one-to-one translation of BNF with pyparsing. Some tricks with using the iterative types should be called into use.
EDIT 2009-11-25: To compensate for the more strenuous test cases, I modified the code to the following:
no_space = NotAny(White(' \t\r'))
# make sure that the EOL immediately follows the escape backslash
continued_ending = Literal('\\') + no_space + lineEnd
word = Word(alphas)
# make sure that the escape backslash immediately follows the word
split_word = word + NotAny(White()) + Suppress(continued_ending)
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word)
multi_line_word.setParseAction(lambda t: ''.join(t))
This has the benefit of making sure that no space comes between any of the elements (with the exception of newlines after the escaping backslashes).