views:

631

answers:

4

I'm using TinyMCE (WYSIWYG) as the default editor in one of my projects and sometimes it automatically adds <p>&nbsp;</p> , <p> </p> or divs.

I have been searching but I couldn't really find a good way of cleaning any empty tags with regex.

The code I've tried to used is,

$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
$str = preg_replace($pattern, '', $str);

Note: I also want to clear &nbsp too :(

+5  A: 

Try /<(\w+)>(\s|&nbsp;)*<\/\1>/ instead. :)

BrianHjoellund
Would you then replace the whitespace in the second argument to preg_replace()?
pix0r
You can use \2 (or $2, i forget the syntax in PHP) to insert the whitespace between the tags.
BrianHjoellund
A: 

That regexp is a little odd - but looks like it might work. You could try this instead:

$pattern = ':<[^/>]*>\s*</[^>]*>:';
$str = preg_replace($pattern, '', $str);

Very similar though.

pix0r
Dropping the white space may not be a great idea. You probably don't want "Hello<span> </span>world" to become "Helloworld".
Ben Blank
A: 

I know it's not directly what you asked for, but after months of TinyMCE, coping with not only this but the hell that results from users posting directly from Word, I have made the switch to FCKeditor and couldn't be happier.

EDIT: Just in case it's not clear, what I'm saying is that FCKeditor doesn't insert arbitrary paras where it feels like it, plus copes with pasted Word crap out of the box. You may find my previous question to be of help.

da5id
A: 

You would want multiple Regexes to be sure you do not eliminated other wanted elements with one generic one.

As Ben said you may drop valid elements with one generic regex

<\s*[^>]*>\s*`&nbsp;`\s*<\s*[^>]*>
<\s*p\s*>\s*<\s*/p\s*>
<\s*div\s*>\s*<\s*/div\s*>
AppDeveloper
No need for multiple regex, you can just do/<(p|div)>(\s| )*<\/\1>/instead. Add tag names as appropiate.
BrianHjoellund