tags:

views:

120

answers:

2

Hi to all, I'm writing a very simple bbcode parse. If i want to replace hello i'm a [b]bold[/b] text, i have success with replacing this regex

r'\[b\](.*)\[\/b\]'

with this

<strong>\g<1></strong>

to get hello, i'm a <strong>bold</strong> text.

If I have two or more tags of the same type, it fails. eg:

i'm [b]bold[/b] and i'm [b]bold[/b] too

gives

i'm <strong>bold[/b] and i'm [b]bold</strong> too

How to solve the problem? Thanks

+7  A: 

You shouldn't use regular expressions to parse non-regular languages (like matching tags). Look into a parser instead.

Edit - a quick Google search takes me here.

danben
I'm new to Python.I know this post was a long time ago, but why is it that a parser would be recommended over regex? How do the two process things differently?Thanks
Mike Hayes
@Mike Hayes: This isn't specific to Python - it is language theory. One simple example of why you need a parser to parse something like matching tags is the string `<b>I am nesting my <b>bold tags</b></b>`. If you just match between pairs of `<b>` and `</b>`, you get the wrong text in this example. To learn more, you should read about the difference between regular languages (for which you can use regular expressions) and context-free languages (for which you need a parser).
danben
+5  A: 

Just change your regular expression from:

r'\[b\](.*)\[\/b\]'

to

r'\[b\](.*?)\[\/b\]'

The * qualifier is greedy, appending a ? to it you make it performing as a non-greedy qualifier.

Here's a more complete explaination taken from the python re documentation:

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

Source: http://docs.python.org/library/re.html

Andrea Zilio
oh, non greedy match.. that worked, thanks :)
pistacchio