views:

72

answers:

1

I've installed CKeditor on a client's site that allows them to enter some text using a WYSIYG editor. It is locked down for the most part, only allowing bold, italic, unordered lists, etc.

I also run the user submitted HTML through HTML purifier to make sure they don't get smart and start trying to add tables, for example. It is also a good idea to limit what they can enter for XSS concerns, for example.

I just looked at some of the output produced by the CKeditor/HTML Purifier combo, and seen this atrocity...

<p>
    <span style="font-size:11px;"><br /></span></p>

Is there anyway I can configure HTML Purifier or use something else to remove elements with no text node? It would obviously need to start at the latest descendant, and then run up the DOM tree, removing the most parent that has no elements with a text node.

Is there any edge cases to this approach, assuming there are never any presentational only elements in that markup? There are no self closing elements that will be present that I can think of (e.g. images, input elements, etc).

A: 

You can use PHP's strip_tags() function:

http://de3.php.net/manual/en/function.strip-tags.php

This will delete all html-tags, except the ones you enter as the secound parameter. In your case to allow bold, italic and unordered lis only this would be:

$text = strip_tags($text, '<b><i><ul>');

Sure there still can be empty ones after this, but you'll surely could get rid of the span and p tags with this.

JochenJung