ansaurus

Question

Regular expression to remove <p> tags around elements wrapped in [...]'s

Answer 1

+5 A:

You can't use regular expressions to parse HTML, because HTML is, by definition, a non-regular language. Period, end of discussion.

Paul Tomblin 2010-01-31 00:31:02

Thanks for the response. I think I get what you are saying, though that's not what I meant. I edited it to clarify.

Matt 2010-01-31 01:17:52

@Matt, just because it hasn't been put into <html> tags and displayed on a browser doesn't mean it isn't HTML and it isn't non-regular. You might think you've handled your test cases, but trust me, somebody will throw up cases that your regexp will break on.

Paul Tomblin 2010-01-31 01:25:18

Answer 2

+3 A:

The language of matching HTML tags is context-free, not regular. This means regular expressions are probably not the right tool to use here. Context-free languages require parsers rather than regular expressions. So, you can either remove ALL <p> and </p> tags with a regular expression, or you can use an HTML parser to remove matching tags from certain parts of your document.

danben 2010-01-31 00:31:57

Its still text before its turned to html. see edited post please.

Matt 2010-01-31 01:18:35

Answer 3

+1 A:

Try this regex:

'%<p[^>]*>\s*(\[([^\[\]]+)\].*?\[/\2\])\s*</p>%s'

Explanation:

\[([^\[\]]+)\] matches the opening bbcode tag and captures the tag name in group #2.

\[/\2\] matches a corresponding losing tag.

.*? matches anything, reluctantly. Thanks to the s flag at the end, it also matches newlines. The effect of the reluctant .*? is that it stops matching the first time it finds a closing bbcode tag with the right name. If tags are nested (within tags with the same name) or improperly balanced, it won't work correctly. I wouldn't expect that be a problem, but I have no experience with WordPress, so YMMV.

Alan Moore 2010-02-02 02:24:10

Thanks a lot for your help!

Matt 2010-02-06 22:53:16

ansaurus

tags:

views:

answers:

Regular expression to remove <p> tags around elements wrapped in [...]'s

related questions