As we all know by now, XSS attacks are dangerous and really easy to pull off. Various frameworks make it easy to encode HTML, like ASP.NET MVC does:
<%= Html.Encode("string"); %>
But what happens when your client requires that they be able to upload their content directly from a Microsoft Word document?
Here's the scenario: People can copy and paste content from Microsoft word into a WYSIWYG editor (in this case tinyMCE), and then that information is posted to a web page.
The website is public, but only members of that organization will have access to post information to a webpage.
What is the best way to handle this requirement? Currently there is no checking done on what the client posts (since only 'trusted' users can post), but I'm not particularly happy with that and would like to lock it down further in case an account is hacked.
The platform in question is ASP.NET MVC.
The only conceptual method that I'm aware of that meets these requirements is to whitelist HTML tags and let those pass through. Is there another way? If not, is the best way to let them store it in the Database in any form, but only display it properly encoded and stripped of bad tags?
NB: The questions differ in that he only assumes there's one way. I'm also asking the following questions:
1. Is there a better way that doesn't rely on HTML Whitelists?
2. Is there a better way that relies on a different view engine?
3. Is there a WYSIWYG editor that includes the ability to whitelist on the fly?
4. Should I even worry about this since it will only be for 'private posting' (Much in the same way that a private blog allows HTML From the author, but since only he can post, it's not an issue)?
Edit #2:
If suggesting a WYSIWYG editor, it must be free (as in speech, or as in beer).
Update:
All of the suggestions thus far revolve around a specific Rich Text Editor to use: Only provide an editor as a suggestion if it allows for sanitization of HTML tags; and it fulfills the requirement of accepting pasted documents from a WYSIWYG Editor like Microsoft Word.
There are three methods that I know of: 1. Not allow HTML. 2. Allow HTML, but sanitize it 3. Find a Rich Text Editor that sanitizes and allows HTML.
The previous questions remain (1-4 above).