tags:

views:

306

answers:

4

I am constructing a large HTML document from fragments supplied by users that have the annoying habit of being malformed in various ways. Browsers are robust and forgiving enough but I want to be able to validate and (ideally) fix any malformed HTML if at all possible. For example:

<td><b>Title</td>

can be reasonably fixed to:

<td><b>Title</b></td>

Is there a way of doing this easily in PHP?

+8  A: 

You can use HTML Tidy, man pages are here.

cletus
HTML Tidy works really well, defiantly recommended.
jtyost2
However there are still few problems with it. It used to remove my intended white spaces, causing some JS problems. Also it parses <script> tags in a way that IE6 doesn't recognize them sometimes - if you still want to optimize your webby for IE6.
Ondrej Slinták
+1  A: 

Since I can't comment to the above, make sure to note that Tidy will add all of the other 'required' stuff around your HTML, even if it's just as simple as your example. This may or may not be what you want, especially if you're trying to make some includable pages.

gms8994
A: 

If you can't use Tidy (sometimes hosting service do not activate this php module), you can use this PHP class: http://www.barattalo.it/html-fixer/

Pons
+2  A: 

I highly recommend HTML Purifier. From their site:

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

Sonny