views:

120

answers:

5

For example if I am colecting a [URL value] in a form, saving that [URL value] in a database, and then using it in a page like this:

<a href="[URL value]" > The Link </a>

How do I protect against this [URL value]:

http://www.somelink.com"&gt; Evil text or can be empty </a>  ALL THE EVIL HTML I WANT  <a href="

How can I protect against this kind of HTML injection for URL form fileds without breaking the URL in case it is valid ?

+5  A: 

When receiving the URL on the form:

  • Use filter_var(url, FILTER_VALIDATE_URL) to ensure the URL is in a valid format.
  • Ensure the URL starts with http:// or https:// (or at least reject all javascript: URL as they can include malignant code)
  • Use prepared statements when inserting the URL (and other form data) in the database or properly escape that data to prevent SQL injections.


When displaying the page:

  • Use htmlspecialchars() to escape the URL (and all other text) that you insert in the HTML.
Alexandre Jasmin
Thats it ! If I apply htmlspecialchars to every URL ( bad or valid ) it does not break the URL but it protects against the HTML injection.
Code Burn
A: 

Use urlencode to encode just the " sign when printing the url, like:

echo '<a href="'.str_replace('"', urlencode('"'), $url).'">link name</a>';
cambraca
If I do this I will get a local link.
Code Burn
Like http://localhost:8888/http://www.submitedLink.com
Code Burn
Alexandre Jasmin
Exactly this break it.. There must be a simple way to do this.
Code Burn
Oh you're right.. maybe just replacing `"` for the appropriate encoded string (look at the edit)
cambraca
Don't use `urlencode` for escaping HTML characters, use `htmlspecialchars` or `htmlentities`
Phil Brown
@Phil Brown: urlencode is for encoding strings for use in URLs. It will encode the special characters html tags and prevent an attempt to create a new hyperlink as the asker wanted to avoid.
emurano
@emurano See Alexandre's comment above. `urlencode` will break the URL. `htmlspecialchars` will convert any characters such as `<`, `>` and most importantly `"` into their HTML entity equivalents, thus negating any potential HTML injection.
Phil Brown
@Phil Brown Ah yes I misinterpreted the question. The untrusted data is the *whole* url. I thought the asker meant that he wanted to have the untrusted data as the value of one of the key/value pairs.
emurano
+1  A: 

The simplest way to do that would be to check that the input contains what looks like a syntactically valid url, with no characters such as > which are not allowed in URL's. The easiest way to do that is using the filter extension. The code to do it would be like this:

if (filter_var($url, FILTER_VALIDATE_URL)) {
    //Valid URL submitted
} else {
    //Invalid URL submitted
}
Jeremy
Is there any way to do this without an extension ?
Code Burn
This does not work in php 4, I need a php 4 solution..
Code Burn
Don't use PHP 4 it's ancient
Alexandre Jasmin
I'd have to agree with Alexandre, but if you're really stuck with PHP 4 the only solution would be to check for a valid URL with a Regex.
Jeremy
A: 

EDIT: Just use urlencode, do not use htmlentities as well

Whenever you put data into the key/value pairs of a URL, you should encode that data using urlencode(). This function will take care of any special characters that syntactic meaning in the way URLs are meant to be constructed. Ampersands, equal signs and question marks will be encoded for you. All the angle brackets AND new line characters in the inject HTML will be encoded too!

<?php $TheVariable = $_REQUEST['suspect_var']; // Untrusted data ?>


<a href="http://www.mysite.com/script.php?untrusted_data=&lt;?php echo urlencode($TheVariable) ?>">The Link</a>
emurano
A: 

For escaping when displaying use htmlentities($var, ENT_QUOTES) or htmlspecialchars($var, ENT_QUOTES). It's needed to escape both single and double quotes because of browser-specific XSS payloads. Check here - http://ha.ckers.org/xss2.html

Also, when validating URL javascript: URI is not the only one dangerous. Other one is data: URI.

Anyway, it's always more secure to exclude everything except whitelisted, then to include everything except black-listed.

p0deje