views:

731

answers:

7

As we all know by now, XSS attacks are dangerous and really easy to pull off. Various frameworks make it easy to encode HTML, like ASP.NET MVC does:

<%= Html.Encode("string"); %>

But what happens when your client requires that they be able to upload their content directly from a Microsoft Word document?

Here's the scenario: People can copy and paste content from Microsoft word into a WYSIWYG editor (in this case tinyMCE), and then that information is posted to a web page.

The website is public, but only members of that organization will have access to post information to a webpage.

What is the best way to handle this requirement? Currently there is no checking done on what the client posts (since only 'trusted' users can post), but I'm not particularly happy with that and would like to lock it down further in case an account is hacked.

The platform in question is ASP.NET MVC.

The only conceptual method that I'm aware of that meets these requirements is to whitelist HTML tags and let those pass through. Is there another way? If not, is the best way to let them store it in the Database in any form, but only display it properly encoded and stripped of bad tags?

NB: The questions differ in that he only assumes there's one way. I'm also asking the following questions:

1. Is there a better way that doesn't rely on HTML Whitelists?
2. Is there a better way that relies on a different view engine?
3. Is there a WYSIWYG editor that includes the ability to whitelist on the fly?
4. Should I even worry about this since it will only be for 'private posting' (Much in the same way that a private blog allows HTML From the author, but since only he can post, it's not an issue)?

Edit #2:

If suggesting a WYSIWYG editor, it must be free (as in speech, or as in beer).

Update:

All of the suggestions thus far revolve around a specific Rich Text Editor to use: Only provide an editor as a suggestion if it allows for sanitization of HTML tags; and it fulfills the requirement of accepting pasted documents from a WYSIWYG Editor like Microsoft Word.

There are three methods that I know of: 1. Not allow HTML. 2. Allow HTML, but sanitize it 3. Find a Rich Text Editor that sanitizes and allows HTML.

The previous questions remain (1-4 above).


Related Question

Preventing Cross Site Scripting (XSS)

A: 

I am doing the same thing. I am using TinyMCE and allowing pasting from Word documents. Only certain people that maintain the site can do this via an admin area. This is secured by ASP.Net Membership. I'm simple doing the HTML.Encode when it gets sent out to the public site.

You could use the code below if you like before it gets put in the database but not sure what knock on affect it would give you. You may have to go with your whitelist.

 /// <summary>
    /// Strip HTML
    /// </summary>
    /// <param name="str"></param>
    /// <returns></returns>
    public static string StripHTML(string str)
    {
        //Strips the HTML tags from strHTML 
        System.Text.RegularExpressions.Regex objRegExp = new System.Text.RegularExpressions.Regex("<(.|\n)+?>");

        // Replace all tags with a space, otherwise words either side 
        // of a tag might be concatenated 
        string strOutput = objRegExp.Replace(str, " ");

        // Replace all < and > with < and > 
        strOutput = strOutput.Replace("<", "<");
        strOutput = strOutput.Replace(">", ">");

        return strOutput;
    }
Jon
If you do HTML.Encode when it gets sent out to the public site, doesn't your HTML Show up as it would in 'View source'?
George Stocker
If they store text such as <script>alert("hey")</script> and you do Html.Encode(<script>alert("hey")</script>) it will just print that to page not run the alert
Jon
I updated my post with a link to Jeff's whitelist; are your sure your whitelist gets everything?
George Stocker
I'm not using a whitelist, I am just storing it as is. The above function could help but I dont know what knock on affect it will have. Would like to know what you decide. Why is my post marked as negative?
Jon
I would guess that's because the way your software is doing it is a very naive implementation; there are all sorts of tricks that will get around your implementation.
George Stocker
How have you decided to progress then? Whitelist, WMD??
Jon
A whitelist is a good idea, but your method certainly is not. Regex is not a reliable way to detect tags in text, as HTML can get pretty obfuscated. Much better to use a library such as the HTML Agility Pack.
Noldorin
+5  A: 

The easiest way (for you as a developer) is probably to implement one of many variations of Markdown, for example Markdown.NET or, even better (imho), a wmd-editor.

Then, your users would be able to paste simple HTML, but nothing dangerous, and they would be able to preview their entered data and straighten out any scruples even before posting...

Tomas Lycken
+1 for using a WMD editor
Charlino
users then have to learn markdown syntax. not good for non-tech people
Jon
They don't have to learn it... take for example the WMD editor with preview used on for stackoverflow.
Charlino
I believe StackOverflow use a custom editor without the need for WMD syntax
Jon
StackOverflow does indeed use WMD. http://blog.stackoverflow.com/2008/05/potential-markup-and-editing-choices/ http://stackoverflow.com/questions/98852/wysiwyg-text-editor-for-webpage/98873#98873
Tomas Lycken
What I meant is that StackOverflow have reverse engineered it and use custom code. Do you see the need to use WMD syntax when asking a question? No, because they have customised the editor.
Jon
What do you mean by WMD syntax? As far as I can tell, all WMD syntax works. And I haven't yet found anything that doesn't work...
Tomas Lycken
Ok; but how does your asking me to use WMD fit in with the ability to paste in word documents?
George Stocker
The editor is customized but from what I've seen so far, it is mainly the fact that toolbar has need modified from the default WMD toolbar which isnot as pretty as the one on so. You can tell WMD to output HTML or the raw syntax text that you see when you are editing posts here.
jn29098
+1  A: 

Regarding point #4: You bet it's still an issue! Most hacks are an inside job, after all.

For a specific editor, I've had good luck using FreeTextBox but I can't speak to how well it matches up to your requirements, especially MVC.

Joel Coehoorn
+2  A: 

Whitelisting is indeed the best way to prevent XSS attacks when allowing users to enter HTML, either directly or using a Rich Text Editor.

About your other questions:

Is there a WYSIWYG editor that includes the ability to whitelist on the fly?

I don't think this could work. You need server side code for this and the RTE runs on the client.

TinyMCE filters tags if you want but since this takes place in the browser you can't trust it. See extended_valid_elements. TinyMCE (Moxie) also suggests whitelisting, see here.

Should I even worry about this since it will only be for 'private posting'

You should always filter HTML unless there are specific reasons not to (very rare). Some reasons: a) functionality that is for internal users today maybe for the public tomorrow b) unauthorized access will have less of an impact

is the best way to let them store it in the Database in any form, but only display it properly encoded and stripped of bad tags?

That is the way I prefer it. I don't like to change user input before inserting into the database for various reasons.

daremon
+1  A: 

My IMHO keep trusting your users until you will go public.

Well, there is no reliable way to achieve your needs. For example any WYSIWYG editor fail to protect form inserting images with URLs (indirect usage track, illegal content) or text (illegal text, misspelled text, missized text).

My point of view is that if you can trust your users, simply allow everything, just warn users if there are KNOW dangerous markup (to keep them from errors).

If you do not trust, use sort of special markup (e.g. Markdown).

In my project we use special types for potentially dangerous content and special methods for rendering and accepting such content. This code has high mark in our thread model and attention to it is very high (for example each change should be reviewed by two independent coders, we have comprehensive test suite and so on).

Mike Chaliy
+1  A: 

Use FckEditor. It's extremely customizable, integrates into asp.net quite well and has a direct feature of pasting word text into it.

sinm
A: 

One option might be the HTML Edit Control for .NET (which I wrote).

It's a WYSIWYM HTML editor for .NET, which only supports a subset of the HTML elements, excluding <script> elements: so in that way it acts as a whitelist.

If it's for internal use (i.e. an intranet site), then the control can be embedded in a web page.

I haven't integrated support for pasting from Word, but I do have a component which is a step in that direction: a Doc to HTML converter; so I have the building blocks which you could use in ASP.NET to convert a Doc to HTML, display the HTML in the editor, etc.

ChrisW