tags:

views:

206

answers:

4

I'm trying to get regexp to match some nested tags. (Yes I know I should use a parser, but my input will be correct).

Example:

Text.
More text.
[quote]
First quote
[quote]
Nested second quote.
[/quote]
[/quote]

Let's say I want the regexp to simply change the tags to <blockquote>:

Text.
More text.
<blockquote>
First quote
<blockquote>
Nested second quote.
</blockquote>
</blockquote>

How would I do this, matching both opening and closing tags at the same time?

+3  A: 

If you don’t mind correctness, then you could use a simple string replacement and replace each tag separately. Here’s some example using PHP’s str_replace to replace the opening and closing tags:

$str = str_replace('[quote]', '<blockquote>', $str);
$str = str_replace('[/quote]', '</blockquote>', $str);

Or with the help of a regular expression (PHP again):

$str = preg_replace('~\[(/?)quote]~', '<$1blockquote>', $str);

Here the matches of \[(/?)quote] are replaced with <$1blockquote> where $1 is replaced with the match of the first group of the pattern ((/?), either / or empty).

But you should really use a parser that keeps track of the opening and closing tags. Otherwise you can have an opening or closing tag that doesn’t have a counterpart or (if you’re using further tags) that is not nested properly.

Gumbo
Are you sure OP wants PHP?
KennyTM
@KennyTM: Ah, thanks for the remark. I don’t know how I assumed that he wants to use PHP.
Gumbo
+2  A: 

You can't match (arbitrarily) nested stuff with regular expressions.

But you can replace every instance of [quote] with <blockquote> and [/quote] with </blockquote>.

KennyTM
Caveat: You can match nested stuff to a predetermined depth: http://blog.stevenlevithan.com/archives/regex-recursion
ghoppe
"You can't match (arbitrarily) nested stuff with regular expressions."That's the answer I was looking for :)So I used a BBCode parser:http://nbbc.sourceforge.net/
soupagain
+1  A: 

It's a lousy idea, but you're apparently trying to match something like: \[\(/?\)quote\] and replace it with: <\1blockquote>

Jerry Coffin
+1  A: 

You could use 2 expressions.

s/\[quote\]/\<blockquote\>/
s/\[\/quote\]/\<\/blockquote\>/
Micah