tags:

views:

2162

answers:

3

if I have a string like

<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>
<p class="b">&nbsp;</p>
<p>blah blah blah this is some real content</p>
<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>

how can I turn it into just

<p>blah blah blah this is some real content</p>

needs to pick up nbsps and regular spaces

+1  A: 

This regex will work against your example:

<p[^>]*>(?:\s+|(?:&nbsp;)+|(?:<br\s*/?>)+)*</p>
Peter Boughton
+5  A: 
$result = preg_replace('#<p[^>]*>(\s|&nbsp;?)*</p>#', '', $input);

This doesn't catch literal nbsp characters in the output, but that's very rare to see.

Since you're dealing with HTML, if this is user-input I might suggest using HTML Purifier, which will also deal with XSS vulnerabilities. The configuration setting you want there to remove empty p tags is %AutoFormat.RemoveEmpty.

Edward Z. Yang
+1  A: 

As the original replier stated, regex isn't the best solution here, what you want is some sort of html stripper.

A function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

Should help you out, you just need to use a bit of string manipulation to get the new lines and what not back to the format you want.

jim