views:

173

answers:

1

How does one implement a parser (in Python) for a subset of wikitext that modifies text, namely:

*bold*, /italics/, _underline_

I'm converting it to LaTeX, so the conversion is from:

Hello, *world*! Let's /go/.

to:

Hello \textbf{world}! Let's \textit{go}.

Though there's nothing specific about it being a conversion to LaTeX (notably except nested cases like "*bold /italics* whatami/" => "textbf{bold \textit{italics} whatami}").

I've looked at existing markup libraries, but they're (a) not quite the wiki language I'd like, and (b) seemingly overpowered for this problem.

I've considered reverse engineering Creoleparser, but I'd like to know what suggestions others have before I undertake that effort.

Thanks!

+4  A: 

If your language is small, regular expressions might be the least painful solution:

>>> import re
>>> str = "Hello, *world*! Let's /go/."
>>> str = re.sub(r"\*([^\*]*)\*", r"\textbf{\1}", str)
>>> str = re.sub(r"/([^/]*)/",   r"\textit{\1}", str)
>>> str
"Hello, \textbf{world}! Let's \textit{go}."
Can Berk Güder
+1... for example, Markdown's Python implementation is largely done with regex search-and-replace.
David Zaslavsky
+1 just for using regex. I should seriously learn it.
Rodrigo