ansaurus

Question

Implementing parser for markdown-like language

Answer 1

+4 A:

If one thing includes another, then normally you treat them as separate tokens and then nest them in the grammar. Lepl (http://www.acooke.org/lepl which I wrote) and PyParsing (which is probably the most popular pure-Python parser) both allow you to nest things recursively.

So in Lepl you could write code something like:

# these are tokens (defined as regexps)
stg_marker = Token(r'\*\*')
emp_marker = Token(r'\*') # tokens are longest match, so strong is preferred if possible
spo_marker = Token(r'%%')
....
# grammar rules combine tokens
contents = Delayed() # this will be defined later and lets us recurse
strong = stg_marker + contents + stg_marker
emphasis = emp_marker + contents + emp_marker
spoiler = spo_marker + contents + spo_marker
other_stuff = .....
contents += strong | emphasis | spoiler | other_stuff # this defines contents recursively

Then you can see, I hope, how contents will match nested use of strong, emphasis, etc.

There's much more than this to do for your final solution, and efficiency could be an issue in any pure-Python parser (There are some parsers that are implemented in C but callable from Python. These will be faster, but may be trickier to use; I can't recommend any because I haven't used them).

andrew cooke 2010-08-21 12:36:58

See http://stackoverflow.com/questions/3495019/parsing-latex-like-language-in-java for a similar solution.

Ira Baxter 2010-08-21 16:23:52

ansaurus

tags:

views:

answers:

Implementing parser for markdown-like language

related questions