The "high-level" best practice for doing this is:
- Store user-input the way it was entered into the system
- HTML encode all user-input when it is output on any page
- Use a white-list approach to "de-encode" allowed HTML characters, attributes, attribute values, etc. that you encoded in the previous step
HTML Encoding user-input on output will stop JavaScript from being executed on your site.
The reasons why you want to store user-input "as entered" is because you may in the future decide to output user data in other formats (PDF, email, JavaScript, RSS, etc) that don't have the same rules for encoding. As a result, you should keep data as close to its original form as possible. This will make things easier to deal with later.
For HTML Encoding user-input, you can use System.Web.HttpUtility.HtmlEncode(...)
.
To combine steps 2 & 3, you can use Microsoft's AntiXSS library. It provides some extra encoding methods that the HttpUtility class doesn't provide to make your job easier. I was unaware until Malcolm pointed out in the comments, that the latest version of this library includes a method called GetSafeHtmlFragment(...)
which will remove all JavaScript manually. This will handle all of the heavy lifting of removing user-entered JavaScript code for you. You will most likely want to use GetSafeHtmlFragment
and not GetSafeHtml
, which is designed to encode entire HTML documents.