views:

221

answers:

3

Hey everyone,

I have a security related question. My web application allows users to input URLs. The URL is immediately stored in the database (no santization at this point. Is this wrong?). I'm using Linq to SQL so it's already parameterized. When displaying the hyperlink back to the user, I'm using a repeater. Do I need to encode the hyperlink text as well as the tooltip and href property? Or do I only have to encode the text (which is displayed). Also, I assume URL encode is what I need here, but do I also have to use HTML encode?

I tried Server.UrlEncode on all three properties where the text was <script> alert("hello") </script> and it seemed to mess up the href and text. I'm guessing this means that it's not fully secured?

Edit - I should add, if I encode on output, how can I make it so that a "/" is displayed instead of "%2"? Thanks

+1  A: 

Is this wrong? YES Sanitize upon input, not output.

If you sanitize before saving it to db, no need to encode when outputting. Rule of thumb: trust your data on all layers or your app, thus sanitize early.

Andrew Kolesnikov
I was following the advice from `http://msdn.microsoft.com/en-us/library/bb355989.aspx` - it says `Avoid the mistake of encoding the data early. Make sure you encode at the last possible opportunity before the data is displayed to the client.`
Skoder
+3  A: 

Do you allow arbitrary http(s) links/text?

The text (innerHtml of anchor tag) must be htmlentity encoded. As far as the href:

First on input, at a minimum check that the input url is really a http or https link with only valid characters in hostname and path (use RFC but feel free to constrain further; note punycode is used for non-ascii domain names, so whitelist of chars is short). This will prevent javascript: urls, username:password@hostname urls used for phishing, ftp://, kindle://, and other schemes, use of \ in url (converted to / by IE but might confuse your reading of domain), use of excessive blank space www.good%20{N times}evil.com urls, etc.

If you allow params, urlencode the individual names and values although they affect target (also don't html entity encode). Strip # and anything after since that is not sent to target anyway. Enclose the href in double quotes.

It may be a good idea to warn users when they navigate away from your site if applicable. Note that target site will get page url as referrer. Other option is to allow only links to whitelisted domains that you know are not harmful (apart from behaving responsibly this will prevent your site from being identified as linking to harmful sites by netcraft, google, etc).

mar
Thanks for the reply. The links will be displayed in the user's profile area. For example, their profile says 'Home-page'. When other users visit the person's profile, they can then see the link. Therefore, I can't really check where the links are heading, but want to ensure that they won't cause problems to users of my site who don't actually visit the link.
Skoder
+1  A: 

You should sanitize on input (this is different from escaping) straight away. This means performing some sort of validation that the data is a URL, or at the very least, only contains characters allowed for a URL. Use Regex to do this, or a URL parsing library (Sorry I'm not too familiar with .NET's API).

You should encode on output, unless you want to use it as a URL in an HTML element (which you do!), in which case you shouldn't do any encoding. You'll definitely need to encode tooltip and and the text in the body of the link tag. I would think extra hard about how you sanitize the input before it's entered into the database. I suggest browsing this fantastic resource of example XSS attacks.

The reason you encode on output, not when saving to the database, is that each output medium may have different encoding/escaping rules. e.g. HTML is different from JavaScript, which is different from say PDF, or Flash, or CSS, etc...

Also I assume you're using prepared statements when saving to the database, to avoid SQL Injection?

Mike
Thanks. I will be checking that URLs are being entered using a regex I've got. If I encode the text, it comes up as `%2` instead of a '/'. How would I overcome this? When I post to a forum, for example, the URL is maintained. Does this mean that they don't encode on output as assume that the input is valid?
Skoder
Correct, they don't encode on output as the URL is already in the format required for the A tag's href attribute. Now, if they wanted to display the URL as the link text, then you would encode it, e.g.<a href="nonencoded_url">encoded_url</a>Hope that makes more sense.
Mike
Thanks Mike, that makes sense.
Skoder