I've been scanning some off the discussions on sanitizing HTML markup strings for redisplay on a page (e.g. blog comments). In the past I've just unilaterally escaped the markup for re-display.
Does anyone know if there are any solutions out there that go beyond just removing "unsafe" tags?
What if the markup is invalid? For example, how do you prevent and unclosed <b> tag from bold facing all the text that follows it on in on the page?
It seems like Stackoverflow handles this.
Example of unclosed 'b' tag
Thanks.