views:

114

answers:

5

We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script> tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script> elements.

Is this all I need to do? Are all other tags other than the <script> tag safe? If you were an attacker, what else would you attempt to do?

Any best practices I need to follow?

EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?

+5  A: 

XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.

Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.

This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.

For more information and guidance, see this OWASP article.

Oded
Definitely use an existing library which implements a parse+whitelist approach! In addition to only allowing whitelisted tags and attributes, you need to make sure the library handles quirks in different browsers' html parsing! For example, a backspace character may cause your parser to skip some HTML as not a tag, but the browser may interpret it as '<script>'.
Annie
Is there a sample whitelist of all the allowable tags?
How is the anti Xss library different from the native HtmlEncode for my purpose?
Josh Stodola
Annie
A: 

Removing only the <script> tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.

I'm not sure what the best options are in ASP.NET, but this might shed some light: http://msdn.microsoft.com/en-us/library/ms998274.aspx

Eric
+2  A: 

Have a look at this page:

http://ha.ckers.org/xss.html

to get an idea of different XSS attacks that somebody may try.

Oli
A: 

This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript: URLs, sometimes CSS, object and iframe tags, etc).

The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.

Brian Campbell
Is there a sample of whitelist tags and attributes I can use?
Did you take a look at the example I linked to? It has a sample list of tags and attributes that can be whitelisted.
Brian Campbell
+1  A: 

There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:

  • Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like <scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt> will get past you since after your remove <script> tags from the string, you'll have created a new one.
  • Strip the use of the javascript: protocol in href and src attributes.
  • Strip embedded event handler attributes like onmouseover/out, onclick, onkeypress, etc.
  • White lists are safer than black lists. Only allow tags and attributes that you know are safe.
  • Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.

Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.

Justin Johnson