views:

906

answers:

6

We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?


Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:

<a href="http://stackoverflow.com"&gt;stackoverflow.com&lt;/a&gt;

What I really worry about is them using this in a XSS hack. I.e. they input:

alert('hacked!');

So other users get this link:

<a href="alert('hacked!');">stackoverflow.com</a>

My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

A: 

Allowing a URL and allowing JavaScript are 2 different things.

Nick Stinemates
No, they're not, if the URL is displayed back on the page.
Joel Coehoorn
?? a Uniform Resource Locator is not Javascript, displaying the URL back on the page has nothing to do with Javascript
warren
That's what I used to think, too. Trust me on this: you are wrong. And if you think you're right, you are in big trouble.
Jeff Atwood
Maybe I didn't explain it well enough: User enters "stackoverflow.com" and if we turn that into "<a href="http://stackoverflow.com">stackoverflow.com</a>" there's the risk introduced. If you just let anything through they can do: "<a href="alert('hacked!');">stackoverflow.com</a>"
Keith
warren
Yep,http://somewebsite.com <-- I assume that is the input, not <a href=""></a>
Nick Stinemates
+1  A: 

How about not displaying them as a link? Just use the text.

Combined with a warning to proceed at your own risk may be enough.

addition - see also http://stackoverflow.com/questions/176195/for-hosted-applications-should-i-be-sanitizing for a discussion on sanitizing user input

warren
That's an idea we thought of, definitely secure, but our users are relatively low-tech. They would really like links that they can click.
Keith
understandable, I prefer them generally, but copy/paste *does* make me take a couple seconds to decide if I *REALLY* want to do it
warren
That's not secure either. They could still find a way to embed a script tag.
Joel Coehoorn
Why are we allowing tags? I assume he was referring to turning any instance of: - http://somesite.com - https://somesite.comIn to <a href="http://somesite.com">http://somesite.com</a>
Nick Stinemates
+1  A: 

you don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library V1.5

it is very easy to use, all you need is an include and that is it :)

and while you're in the topic, why not given a read on Design Guidelines for Secure Web Applications

if any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Pyton, ROR, etc)

balexandre
We're specifically on C# 3.5 and ASP.Net - I'll check that library out.
Keith
+6  A: 

If you think URLs can't contain code, think again!

http://ha.ckers.org/xss.html

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}
Jeff Atwood
Great link. Time to add new test cases...
Mnebuerquo
I've seen that link before - it's part of what I worry about with this. We have to be very careful as a single XSS hack could cost us a great deal. Your Regex based solution seems to have been working well on SO, certainly. Would you consider it safe for, say, banking applications?
Keith
not so well I might say Keith, <a href="http://stackoverflow.com/questions/209327/url-rewriting-international-letters#209776">it does not accept special chars in the URL</a>, that with URL Rewriting are safe to pass like:<pre>http://www.gynækologen.dk/Undersøgelser_og_behandlinger.aspx</pre>
balexandre
This is not enough. Unless I'm missing something, this string would pass through the filter: javascript:alert('hacked')
Patrick McElhaney
Even this would get through: javascript:while(true)alert('Hacked!'); I've tested a couple places here on SO and it looks like SanatizeUrl is only part of the solution.
Patrick McElhaney
This set of characters still allows a lot of code. Lack of '"' can be worked around with /xxx/.source.
porneL
How does this fit with what you posted here: http://www.codinghorror.com/blog/archives/001181.html
Sam Hasler
+2  A: 

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

Patrick McElhaney
porneL
+3  A: 

The process of rendering a link "safe" should go through three or four steps:

  • Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
  • Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
  • Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
  • Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Bell