views:

179

answers:

2

I'm working in Wordpress and need to be able to remove images and empty paragraphs. So far, I've found out how to remove images without a problem. But, I then need to remove empty paragraph tags. I'm using PHP preg_replace to handle the regex functions.

So, as an example, I have the string:

<p style="text-align:center;"><img src="http://www.blah.com/image.jpg" alt="Blah Image" /></p><p>Some text</p>

I run this regex on it:

/<img.*?(>)/

And I end up with this string:

<p style="text-align:center;"></p><p>Some text</p>

I then need to be able to remove the empty paragraph. I tried this, but it removes all paragraphs and the contents of the paragraphs:

/<p[^>]*><\/p[^>]*>/

Any help/suggestions is greatly appreciated!

+3  A: 

The correct regex is no regex. Use an HTML/DOM Parser instead. They're simple to use. Regex is for regular languages (which HTML is not).

webbiedave
Thanks for the info. I'll have to check this out.
matthewpavkov
A: 

/<p[^>]*><\/p[^>]*>/ (the regex you gave) should work fine. If it's giving you trouble you could try double-escaping the / like this: /<p[^>]*><\\/p[^>]*>/

PHP is funny about quoting and escape characters. For example "\n" is not equal to '\n'. The first is a line break, the second is a literal backslash followed by an 'n'. The PHP manual entry on string literals is probably worth a quick look.

no
"/n" and '/n' are identical; each is a two-character string consisting of a forward-slash followed by an 'n'. Also, your double-escape suggestion would introduce a spurious backslash into the regexp.
Ben Dunlap
Heh, those were supposed to be backslashes. The double-escape is needed if it's inside a single quoted string but not a double quoted one, I think. Let me double check.
no
The double-escape suggestion doesn't insert any extra characters into the regex. It doesn't help either though. In fact it does nothing.
no