ansaurus

Question

Best way to handle security and avoid XSS with user entered URLs

Answer 1

A:

Allowing a URL and allowing JavaScript are 2 different things.

Nick Stinemates 2008-10-15 18:48:21

No, they're not, if the URL is displayed back on the page.

Joel Coehoorn 2008-10-15 18:49:12

?? a Uniform Resource Locator is not Javascript, displaying the URL back on the page has nothing to do with Javascript

warren 2008-10-15 18:50:19

That's what I used to think, too. Trust me on this: you are wrong. And if you think you're right, you are in big trouble.

Jeff Atwood 2008-10-15 18:53:31

Maybe I didn't explain it well enough: User enters "stackoverflow.com" and if we turn that into "<a href="http://stackoverflow.com">stackoverflow.com</a>" there's the risk introduced. If you just let anything through they can do: "<a href="alert('hacked!');">stackoverflow.com</a>"

Keith 2008-10-15 18:53:42

warren 2008-10-15 18:56:37

Yep,http://somewebsite.com <-- I assume that is the input, not <a href=""></a>

Nick Stinemates 2008-10-15 19:54:16

Answer 2

+1 A:

How about not displaying them as a link? Just use the text.

Combined with a warning to proceed at your own risk may be enough.

addition - see also http://stackoverflow.com/questions/176195/for-hosted-applications-should-i-be-sanitizing for a discussion on sanitizing user input

warren 2008-10-15 18:49:15

That's an idea we thought of, definitely secure, but our users are relatively low-tech. They would really like links that they can click.

Keith 2008-10-15 19:06:52

understandable, I prefer them generally, but copy/paste *does* make me take a couple seconds to decide if I *REALLY* want to do it

warren 2008-10-15 19:47:14

That's not secure either. They could still find a way to embed a script tag.

Joel Coehoorn 2008-10-15 23:40:28

Why are we allowing tags? I assume he was referring to turning any instance of: - http://somesite.com - https://somesite.comIn to <a href="http://somesite.com">http://somesite.com</a>

Nick Stinemates 2008-10-16 00:25:21

Answer 3

+1 A:

you don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library V1.5

it is very easy to use, all you need is an include and that is it :)

and while you're in the topic, why not given a read on Design Guidelines for Secure Web Applications

if any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Pyton, ROR, etc)

balexandre 2008-10-15 18:51:47

We're specifically on C# 3.5 and ASP.Net - I'll check that library out.

Keith 2008-10-15 18:59:12

Answer 4

+6 A:

If you think URLs can't contain code, think again!

http://ha.ckers.org/xss.html

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}

Jeff Atwood 2008-10-15 18:56:39

Great link. Time to add new test cases...

Mnebuerquo 2008-10-15 19:47:37

I've seen that link before - it's part of what I worry about with this. We have to be very careful as a single XSS hack could cost us a great deal. Your Regex based solution seems to have been working well on SO, certainly. Would you consider it safe for, say, banking applications?

Keith 2008-10-16 09:51:26

not so well I might say Keith, <a href="http://stackoverflow.com/questions/209327/url-rewriting-international-letters#209776">it does not accept special chars in the URL</a>, that with URL Rewriting are safe to pass like:<pre>http://www.gynækologen.dk/Undersøgelser_og_behandlinger.aspx</pre>

balexandre 2008-10-16 20:40:50

This is not enough. Unless I'm missing something, this string would pass through the filter: javascript:alert('hacked')

Patrick McElhaney 2008-10-16 20:44:24

Even this would get through: javascript:while(true)alert('Hacked!'); I've tested a couple places here on SO and it looks like SanatizeUrl is only part of the solution.

Patrick McElhaney 2008-10-16 20:50:25

This set of characters still allows a lot of code. Lack of '"' can be worked around with /xxx/.source.

porneL 2008-10-19 00:55:26

How does this fit with what you posted here: http://www.codinghorror.com/blog/archives/001181.html

Sam Hasler 2008-11-13 15:08:33

Answer 5

+2 A:

Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)

Patrick McElhaney 2008-10-15 18:57:01

porneL 2008-10-19 00:56:29

Answer 6

+3 A:

The process of rendering a link "safe" should go through three or four steps:

Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Bell 2008-10-16 20:08:37

ansaurus

tags:

views:

answers:

Best way to handle security and avoid XSS with user entered URLs

related questions