views:

901

answers:

3

Hello, I've got the common situation where I've got user input that uses a subset of HTML (input with tinyMCE). I need to have some server-side protection against XSS attacks and am looking for a well-tested tool that people are using to do this. On the PHP side I'm seeing lots of libraries like HTMLPurifier that do the job, but I can't seem to find anything in .NET.

I'm basically looking for a library to filter down to a whitelist of tags, attributes on those tags, and does the right thing with "difficult" attributes like a:href and img:src

I've seen Jeff Atwood's post at http://refactormycode.com/codes/333-sanitize-html, but I don't know how up-to-date it is. Does it have any bearing at all to what the site is currently using? And in any case I'm not sure I'm comfortable with that strategy of trying to regexp out valid input.

This blog post lays out what seems to be a much more compelling strategy:

http://blog.bvsoftware.com/post/2009/01/08/How-to-filter-Html-Input-to-Prevent-Cross-Site-Scripting-but-Still-Allow-Design.aspx

This method is to actually parse the HTML into a DOM, validate that, then rebuild valid HTML from it. If the HTML parsing can handle malformed HTML sensibly, then great. If not, no big deal -- I can demand well-formed HTML since the users should be using the tinyMCE editor. In either case I'm rewriting what I know is safe, well-formed HTML.

The problem is that's just a description, without a link to any library that actually executes that algorithm.

Does such a library exist? If not, what would be a good .NET HTML parsing engine? And what regular expressions should be used to perform extra validation a:href, img:src? Am I missing something else important here?

I don't want re-implement a buggy wheel here. Surely there's some commonly used libraries out there. Any ideas?

A: 

Hi Clyde,

I had the exact same problem a few years back when I was using TinyMCE.

There still doesn't seem to be any decent XSS / HTML white-listing solutions for .Net so I've uploaded a solution I created and have been using for a few years.

http://www.codeproject.com/KB/aspnet/html-white-listing.aspx

The white list defnintion is based on TinyMCE's valid-elements.

Take Two: Looking around, Microsoft have recently released a white-list based Anti-XSS Library (V3.0), check that out:

The Microsoft Anti-Cross Site Scripting Library V3.0 (Anti-XSS V3.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include: - An expanded white list that supports more languages - Performance improvements - Performance data sheets (in the online help) - Support for Shift_JIS encoding for mobile browsers - A sample application - Security Runtime Engine (SRE) HTTP module

A: 

Microsoft has an open-source library to protect against XSS: AntiXSS.

Tommy Carlier
What's wrong with this answer? Why was it downvoted?
Tommy Carlier
Well AntiXSS is just encoding, it's not a stripper or a whitelist solution (yet)
blowdart
OK, thanks. I haven't used it myself, so I probably should just stick to recommending stuff I know.
Tommy Carlier
+2  A: 

Well if you want to parse, and you're worried about invalid (x)HTML coming in then the HTML Agility Pack is probably the best thing to use for parsing. Remember though it's not just elements, but also attributes on allowed elements you need to allow (of course you should work to an allowed whitelist of elements and their attributes, rather than try to strip things that might be dodgy via a blacklist)

There's also the OWASP AntiSamy Project which is an ongoing work in progress - they also have a test site you can try to XSS

Regex for this is probably too risky IMO.

blowdart
The agility pack is what I ended up using. Seems to be working well
Clyde