views:

4904

answers:

4

Earlier today a question was asked regarding input validation strategies in web apps.

The top answer, at time of writing, suggests in PHP just using htmlspecialchars and mysql_real_escape_string.

My question is: Is this always enough? Is there more we should know? Where do these functions break down?

+57  A: 

When it comes to database queries, always try and use prepared parameterised queries. The mysqli and PDO libraries support this. This is infinitely safer than using escaping functions such as mysql_real_escape_string.

Yes, mysql_real_escape_string is effectively just a string escaping function. It is not a magic bullet. All it will do is escape dangerous characters in order that they can be safe to use in a single query string. However, if you do not sanitise your inputs beforehand, then you will be vulnerable to certain attack vectors.

Imagine the following SQL:

$result = "SELECT fields FROM table WHERE id = ".mysql_real_escape_string($_POST['id']);

You should be able to see that this is vulnerable to exploit.
Imagine the id parameter contained the common attack vector:

1 OR 1=1

There's no risky chars in there to encode, so it will pass straight through the escaping filter. Leaving us:

SELECT fields FROM table WHERE id = 1 OR 1=1

Which is a lovely SQL injection vector.

Whilst these functions are useful, they must be used with care. You need to ensure that all web inputs are validated to some degree. In this case, we see that we can be exploited because we didn't check that a variable we were using as a number, was actually numeric. In PHP you should widely use a set of functions to check that inputs are integers, floats, alphanumeric etc. But when it comes to SQL, heed most the value of the prepared statement. The above code would have been secure if it was a prepared statement as the database functions would have known that 1 OR 1=1 is not a valid literal.

As for htmlspecialchars(). That's a minefield of its own.

There's a real problem in PHP in that it has a whole selection of different html-related escaping functions, and no clear guidance on exactly which functions do what.

Firstly, if you are inside an HTML tag, you are in real trouble. Look at

echo '<img src= "' . htmlspecialchars($_GET['imagesrc']) . '" />';

We're already inside an HTML tag, so we don't need < or > to do anything dangerous. Our attack vector could just be javascript:alert(document.cookie)

Now resultant HTML looks like

<img src= "javascript:alert(document.cookie)" />

The attack gets straight through.

It gets worse. Why? because htmlspecialchars only encodes double quotes and not single. So if we had

echo "<img src= '" . htmlspecialchars($_GET['imagesrc']) . ". />";

Our evil attacker can now inject whole new parameters

pic.png' onclick='location.href=xxx' onmouseover='...

gives us

<img src='pic.png' onclick='location.href=xxx' onmouseover='...' />

In these cases, there is no magic bullet, you just have to santise the input yourself. If you try and filter out bad characters you will surely fail. Take a whitelist approach and only let through the chars which are good. Look at the XSS cheat sheet for examples on how diverse vectors can be

Even if you use htmlspecialchars($string) outside of HTML tags, you are still vulnerable to multi-byte charset attack vectors.

The most effective you can be is to use the a combination of mb_convert_encoding and htmlentities as follows.

$str = mb_convert_encoding($str, ‘UTF-8′, ‘UTF-8′);
$str = htmlentities($str, ENT_QUOTES, ‘UTF-8′);

Even this leaves IE6 vulnerable, because of the way it handles UTF. However, you could fall back to a more limited encoding, such as ISO-8859-1, until IE6 usage drops off.

Cheekysoft
The only thing missed here, is that the first example for the DB query ... a simple intval() would solve the injection. Always use intval() in place of mysqlescape...() when needing a number and not a string.
The Wicked Flea
and remember that using parameterized queries will allow you to always have data treated as data and not code. Use a library such as PDO and use parameterised queries whenever possible.
Cheekysoft
Excellent answer!
joedevon
@The Wicked Flea I would argue type casting with (int) is better. It's faster, and one less parenthesis you need to close on the other side.
alex
+1 : I just had to vote up this 1-year old answer. This even thought me a thing or 2 about XSS.
Duroth
+3  A: 

In addition to Cheekysoft's excellent answer:

  • Yes, they will keep you safe, but only if they're used absolutely correctly. Use them incorrectly and you will still be vulnerable, and may have other problems (for example data corruption)
  • Please use parameterised queries instead (as stated above). You can use them through e.g. PDO or via a wrapper like PEAR DB
  • Make sure that magic_quotes_gpc and magic_quotes_runtime are off at all times, and never get accidentally turned on, not even briefly. These are an early and deeply misguided attempt by PHP's developers to prevent security problems (which destroys data)

There isn't really a silver bullet for preventing HTML injection (e.g. cross site scripting), but you may be able to achieve it more easily if you're using a library or templating system for outputting HTML. Read the documentation for that for how to escape things appropriately.

In HTML, things need to be escaped differently depending on context. This is especially true of strings being placed into Javascript.

MarkR
A: 

I would definitely agree with the above posts, but I have one small thing to add in reply to Cheekysoft's answer, specifically:

When it comes to database queries, always try and use prepared parameterised queries. The mysqli and PDO libraries support this. This is infinitely safer than using escaping functions such as mysql_real_escape_string.

Yes, mysql_real_escape_string is effectively just a string escaping function. It is not a magic bullet. All it will do is escape dangerous characters in order that they can be safe to use in a single query string. However, if you do not sanitise your inputs beforehand, then you will be vulnerable to certain attack vectors.

Imagine the following SQL:

$result = "SELECT fields FROM table WHERE id = ".mysql_real_escape_string($_POST['id']);

You should be able to see that this is vulnerable to exploit. Imagine the id parameter contained the common attack vector:

1 OR 1=1

There's no risky chars in there to encode, so it will pass straight through the escaping filter. Leaving us:

SELECT fields FROM table WHERE id = 1 OR 1=1

I coded up a quick little function that I put in my database class that will strip out anything that isnt a number. It uses preg_replace, so there is prob a bit more optimized function, but it works in a pinch...

function Numbers($input) {
  $input = preg_replace("/[^0-9]/","", $input);
  if($input == '') $input = 0;
  return $input;
}

So instead of using

$result = "SELECT fields FROM table WHERE id = ".mysqlrealescapestring("1 OR 1=1");

I would use

$result = "SELECT fields FROM table WHERE id = ".Numbers("1 OR 1=1");

and it would safely run the query

SELECT fields FROM table WHERE id = 111

Sure, that just stopped it from displaying the correct row, but I dont think that is a big issue for whoever is trying to inject sql into your site ;)

BrilliantWinter
Perfect! This is the exactly kind of sanitisation you need. The initial code failed because it didn't validate that a number was numeric. Your code does this. you should call Numbers() on all integer-use vars whose values originate from outside the codebase.
Cheekysoft
It's worth mentioning that intval() will work perfectly fine for this, since PHP automatically coerces integers to strings for you.
Adam Ernst
I prefer intval. It turns 1abc2 to 1, not 12.
jmucchiello
+1  A: 

An important piece of this puzzle is contexts. Someone sending "1 OR 1=1" as the ID is not a problem if you quote every argument in your query:

SELECT fields FROM table WHERE id='".mysql_real_escape_string($_GET['id'])."'"

Which results in:

SELECT fields FROM table WHERE id='1 OR 1=1'

which is ineffectual. Since you're escaping the string, the input cannot break out of the string context. I've tested this as far as version 5.0.45 of MySQL, and using a string context for an integer column does not cause any problems.

Lucas Oman
and then i'll start my attack vector with the multi-byte char 0xbf27 which in your latin1 database will be converted by the filter fuction as 0xbf5c27 - which is a single multibyte character followed by a single quote.
Cheekysoft
Try not to safeguard against a single known attack-vector. You will end up chasing your tail until the end of time applying patch after patch to your code. Standing back and looking at the general cases will leaed to safer code and a better security-focussed mindset.
Cheekysoft
I agree; ideally, OP will use prepared statements.
Lucas Oman