I want to write a web application that allows users to enter any HTML that can occur inside a <div>
element. This HTML will then end up being displayed to other users, so I want to make sure that the site doesn't open people up to XSS attacks.
Is there a nice library in Python that will clean out all the event handler attributes, <script>
elements and other Javascript cruft from HTML or a DOM tree?
I am intending to use Beautiful Soup to regularize the HTML to make sure it doesn't contain unclosed tags and such. But, as far as I can tell, it has no pre-packaged way to strip all Javascript.
If there is a nice library in some other language, that might also work, but I would really prefer Python.
I've done a bunch of Google searching and hunted around on pypi, but haven't been able to find anything obvious.