ansaurus

Question

Regexp for cleaning the empty, unnecessary HTML tags

Answer 1

+5 A:

Try /<(\w+)>(\s| )*<\/\1>/ instead. :)

BrianHjoellund 2009-05-22 22:21:36

Would you then replace the whitespace in the second argument to preg_replace()?

pix0r 2009-05-22 23:13:36

You can use \2 (or $2, i forget the syntax in PHP) to insert the whitespace between the tags.

BrianHjoellund 2009-05-23 09:51:17

Answer 2

A:

That regexp is a little odd - but looks like it might work. You could try this instead:

$pattern = ':<[^/>]*>\s*</[^>]*>:';
$str = preg_replace($pattern, '', $str);

Very similar though.

pix0r 2009-05-22 22:22:09

Dropping the white space may not be a great idea. You probably don't want "Hello<span> </span>world" to become "Helloworld".

Ben Blank 2009-05-22 22:43:29

Answer 3

A:

I know it's not directly what you asked for, but after months of TinyMCE, coping with not only this but the hell that results from users posting directly from Word, I have made the switch to FCKeditor and couldn't be happier.

EDIT: Just in case it's not clear, what I'm saying is that FCKeditor doesn't insert arbitrary paras where it feels like it, plus copes with pasted Word crap out of the box. You may find my previous question to be of help.

da5id 2009-05-22 22:51:39

Answer 4

A:

You would want multiple Regexes to be sure you do not eliminated other wanted elements with one generic one.

As Ben said you may drop valid elements with one generic regex

<\s*[^>]*>\s*`&nbsp;`\s*<\s*[^>]*>
<\s*p\s*>\s*<\s*/p\s*>
<\s*div\s*>\s*<\s*/div\s*>

AppDeveloper 2009-05-22 23:44:58

No need for multiple regex, you can just do/<(p|div)>(\s| )*<\/\1>/instead. Add tag names as appropiate.

BrianHjoellund 2009-05-23 09:49:35

ansaurus

tags:

views:

answers:

Regexp for cleaning the empty, unnecessary HTML tags

related questions