views:

1121

answers:

7

Let's say I have a simple ASP.NET MVC blog application and I want to allow readers to add comments to a blog post. If I want to prevent any type of XSS shenanigans, I could HTML encode all comments so that they become harmless when rendered. However, what if I wanted to some basic functionality like hyperlinks, bolding, italics, etc?

I know that StackOverflow uses the WMD Markdown Editor, which seems like a great choice for what I'm trying to accomplish, if not for the fact that it supports both HTML and Markdown which leaves it open to XSS attacks.

+3  A: 

How much HTML are you going to support? Just bold/italics/the basic stuff? In that case, you can convert those to markdown syntax and then strip the rest of the HTML.

The stripping needs to be done server side, before you store it. You need to validate the input on the server as well, when checking for SQL-vulnerabilities and other unwanted stuff.

Gerrit
Right on. Take a whitelist approach--not a black list approach.
Michael Haren
+1  A: 

I'd suggest you only submit the markdown syntax. On the front end, the client can type markdown and have an HTML preview (same as SO), but only submit the markdown syntax server-side. Then you can validate it, generate the HTML, escape it and store it.

I believe that's the way most of us do it. In either case, markdown is there to alleviate anyone from writing structured HTML code and give power to those who wouldn't even know how to.

If there's something specific you'd like to do with the HTML, then you can tweak it with some CSS inheritance '.comment a { color: #F0F; }', front end JS or just traverse over the generated HTML from parsing markdown before you store it.

kRON
+1  A: 

Why don't you use Jeff's code ? http://refactormycode.com/codes/333-sanitize-html

dr. evil
A: 

You could use an HTML whitelist so that certain tags can still be used, but everything else is blocked.

There are tools that can do this for you. SO uses the code that Slough linked.

EndangeredMassa
+1  A: 

I'd vote for the FCKEditor but you have to do some extra steps to the returned output too.

Glenn
+5  A: 

If you are not looking to use an editor you might consider OWASP's AntiSamy.

You can run an example here: http://www.antisamy.net/

Flory
A: 

If you need to do it in the browser: http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

Mike Samuel
you can never trust user input, everything that comes from the browser might be tampered
rjlopes
@rjlopes - That isn't a problem if you're trying to sanitize content from a server for presentation in the client.
Mike Samuel
@Mike - my fault I assumed that it was to apply on the client before submitting the information to the server. However in this particular case (you control, the server) it doesn't make much sense to sanitize on the client. The only case it should be useful is when you do ajax requests to third party websites.
rjlopes
@rjlopes It can happen on the client legitimately as well, just not in a way the server relies on. An Ajax heavy application often needs to sync state b/w the browser and server. When the user changes something, the client optimistically updates its model before sending it to the server to update the authoritative model so that the ajax application appears responsive. A comment edit box is a good example. Stack overflow's question editor allows the user to write some mix of markdown/HTML, which it can render in a preview pane without a server round-trip as the user types.
Mike Samuel