Hi,
You might want to take a look at a PHP tool called HTMLPurifier -- there is a demo page available, if you want to quickly check what it can do.
It takes "sort of" HTML as input, and gives well-formed HTML as ouput ; this way, you are not forcing your users to input well-formed HTML, but you can "correct" what they typed.
Another nice thing is, you can specify which tags and attributes are allowed ; which is good for security too :
- for instance, you can allow
<p>
and <strong>
tags, but not <script>
.
- you can also allow
<a>
+ href
; but not <a>
+ onclick
For instance, here is some not-well-formed HTML you can give to it :
<p>this is a <strong>test</p>
<script type="text/javascript">alert('glop');</script>
<p>And this is another <em>te<strong>st</em></strong></p>
And here is the well-formed / secured HTML given as output :
<p>this is a <strong>test</strong></p>
<p>And this is another <em>te<strong>st</strong></em></p>
What has changed ?
- the
<strong>
tag in the first paragraph has been automatically closed
- the
<script>
tag and its content have been removed
- the order of the closing
<em>
and <strong>
tags in the second paragraph has been corrected.
This was just a quick example, of course -- I hope it helped.