tags:

views:

163

answers:

2

I've been scanning some off the discussions on sanitizing HTML markup strings for redisplay on a page (e.g. blog comments). In the past I've just unilaterally escaped the markup for re-display.

Does anyone know if there are any solutions out there that go beyond just removing "unsafe" tags?

What if the markup is invalid? For example, how do you prevent and unclosed <b> tag from bold facing all the text that follows it on in on the page?

It seems like Stackoverflow handles this.

Example of unclosed 'b' tag

Thanks.

+3  A: 

Stackoverflow either uses textile or something very much like it.

Textile is more or less guaranteed to spit out valid (x)html, ameliorating many typical problems with sanitizing user input.

Triptych
A: 

Check this code:

Sanitize HTML, I think StackOverflow uses it somewhere...

A method to sanitize any potentially dangerous tags from the provided raw HTML input using a whitelist based approach, leaving the "safe" HTML tags.

CMS