I'm building a simple web based forum application. I want to allow users to include html in their posts, but would like to stop any cross site scripting. My current stratagy is to not allow any "script" tags, to only allow "style" and "href" attributes on any tag, and to not allow "href" values to start with "javascript:". Is there anything that I'm missing?
UPDATE: I ended up solving this with a "whitelist" of html elements. When invalid elements are found, I strip off the tag but leave the inner html. This solves the problem of people copying and pasting from a MS Word document. I also looked into antisamy.net but ran into some issues with how it handled style attributes on spans (i.e. removes them). If I can get that worked out I may switch over to that solution.