ansaurus

Question

Are these two functions overkill for sanitization?

Answer 1

+4 A:

It's true, but this level of escaping may not be appropriate in all cases. What if you want to store HTML in a database?

Best practice dictates that, rather than escaping on receiving values, you should escape them when you display them. This allows you to account for displaying both HTML from the database and non-HTML from the database, and it's really where this sort of code logically belongs, anyway.

Another advantage of sanitizing outgoing HTML is that a new attack vector may be discovered, in which case sanitizing incoming HTML won't do anything for values that are already in the database, while outgoing sanitization will apply retroactively without having to do anything special

Also, note that strip_tags in your first function will likely have no effect, if all of the < and > have become < and >.

Matchu 2010-05-30 19:50:36

Another advantage of sanitizing outgoing HTML is that a new attack vector may be discovered, in which case sanitizing incoming HTML won't do anything for values that are already in the database, while outgoing sanitization will apply retroactively without having to do anything special.

Dave Sherohman 2010-05-30 20:13:37

@Dave Sherohman: I kinda tried to express that concept, but you explained it better. I'll edit in your wording.

Matchu 2010-05-30 20:39:07

So what about the sql injection vulnerability?

Rook 2010-05-30 22:28:41

@The Rook - definitely use `mysql_real_escape_string`, of course! The specific point here is that sanitizing HTML on entering it into the database is not best practice.

Matchu 2010-05-30 23:17:26

Answer 2

A:

Well, if you don't want to reinvent the wheel you can use HTMLPurifier. It allows you to decide exactly what you want and what you don't want and prevents XSS attacks and such

nico 2010-05-30 19:50:48

Answer 3

+3 A:

You are doing htmlentities (which turns all > into >) and then calling strip_tags which at this point will not accomplish anything more, since there are no tags.

webdestroya 2010-05-30 19:51:31

Yeah, I think I should get rid of strip_tags. Kinda pointless

jpjp 2010-05-30 19:52:25

Or call it before htmlentities

stagas 2010-05-30 20:35:52

Answer 4

+9 A:

To be honest, I think the author of these function has either no idea what XSS and SQL injections are or what exactly the used function do.

Just to name two oddities:

Using stripslashes after mysql_real_escape_string removes the slashes that were added by mysql_real_escape_string.
htmlentities replaces the chatacters < and > that are used in strip_tags in order to identify tags.

Furthermore: In general, functions that protect agains XSS are not suitable to protect agains SQL injections and vice versa. Because each language and context hast its own special characters that need to be taken care of.

My advice is to learn why and how code injection is possible and how to protect against it. Learn the languages you are working with, especially the special characters and how to escape these.

Edit Here’s some (probably weird) example: Imagine you allow your users to input some value that should be used as a path segment in a URI that you use in some JavaScript code in a onclick attribute value. So the language context looks like this:

HTML attribute value
- JavaScript string
  - URI path segment

And to make it more fun: You are storing this input value in a database.

Now to store this input value correctly into your database, you just need to use a proper encoding for the context you are about to insert that value into your database language (i.e. SQL); the rest does not matter (yet). Since you want to insert it into a SQL string declaration, the contextual special characters are the characters that allow you to change that context. As for string declarations these characters are (especially) the ", ', and \ characters that need to be escaped. But as already stated, prepared statements do all that work for you, so use them.

Now that you have the value in your database, we want to output them properly. Here we proceed from the innermost to the outermost context and apply the proper encoding in each context:

For the URI path segment context we need to escape (at least) all those characters that let us change that context; in this case / (leave current path segment), ?, and # (both leave URI path context). We can use rawurlencode for this.
For the JavaScript string context we need to take care of ", ', and \. We can use json_encode for this (if available).
For the HTML attribute value we need to take care of &, ", ', and <. We can use htmlspecialchars for this.

Now everything together:

'… onclick="'.htmlspecialchars('window.open("http://example.com/'.json_encode(rawurlencode($row['user-input'])).'")').'" …'

Now if $row['user-input'] is "bar/baz" the output is:

… onclick="window.open(&quot;http://example.com/&amp;quot;%22bar%2Fbaz%22&amp;quot;&amp;quot;)" …

But using all these function in these contexts is no overkill. Because although the contexts may have similar special characters, they have different escape sequences. URI has the so called percent encoding, JavaScript has escape sequences like \" and HTML has character references like ". And not using just one of these functions will allow to break the context.

Gumbo 2010-05-30 20:08:07

I think I'm just going to stick with mysql_real_escape_string and prepared statements for sql injections and htmlentities for xss.

jpjp 2010-05-30 20:16:16

@jpjp: That’s a good way to go! And if you’re using a character encoding that can encode Unicode (like UTF-8), you can even just use `htmlspecialchars` instead of `htmlentities`.

Gumbo 2010-05-30 20:19:27

Is mysql_real_escape_string required for prepared statements?

stagas 2010-05-30 20:35:09

@stagas - nope. The database system handles the replacement internally. That's a lot of the point of using them, actually.

Matchu 2010-05-30 20:40:23

@stagas: No. Prepared statements use placeholders to mark the positions where the actual values are inserted. Then the statement is sent to the database system that distinguishes the types. Then you just send the values separately to the database system and it does the rest. (See http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html)

Gumbo 2010-05-30 20:42:32

+1 you are correct sir.

Rook 2010-05-30 22:59:48

Answer 5

+2 A:

If you're using prepared statements and SQL placeholders and never interpolating user input directly into your SQL strings, you can skip the SQL sanitization entirely.

When you use placeholders, the structure of the SQL statement (SELECT foo, bar, baz FROM my_table WHERE id = ?) is send to the database engine separately from the data values which are (eventually) bound to the placeholders. This means that, barring major bugs in the database engine, there is absolutely no way for the data values to be misinterpreted as SQL instructions, so this provides complete protection from SQL injection attacks without requiring you to mangle your data for storage.

Dave Sherohman 2010-05-30 20:19:47

So after I mysql_real_escape_string the variable(say $name), it would be to just insert into the db by prepared statements?

jpjp 2010-05-30 20:24:30

@jpjp: If you use prepared statements you don't need to do anything to the string, no mysql_real_escape_string, no sanitization of any kind. For more, google PHP PDO.

TheMagician 2010-05-30 20:29:59

Answer 6

+2 A:

No, this isn't overkill this is a vulnerability.

This code completely vulnerable to SQL Injection. You are doing a mysql_real_escape_string() and then you are doing a stripslashes(). So a " would become \" after mysql_real_escape_string() and then go back to " after the stripslashes(). mysql_real_escape_string() alone is best to stop sql injection. Parameterized query libraries like PDO and ADODB uses it, and Parameterized queries make it very easy to completely stop sql injection.

Go ahead test your code:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
mysql_query("select * from mysql.user where Host='".$variable."'");

What if:

$_POST['user_input'] = 1' or 1=1 /*

Patched:

mysql_query("select * from mysql.user where Host='".mysql_real_escape_string($variable)."'");

This code is also vulnerable to some types XSS:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
print("<body background='http://localhost/image.php?".$variable."' >");

What if:

$_POST['user_input']="' onload=alert(/xss/)";

patched:

$variable=htmlspecialchars($variable,ENT_QUOTES);
print("<body background='http://localhost/image.php?".$variable."' >");

htmlspeicalchars is encoding single and double quotes, make sure the variable you are printing is also encased in quotes, this makes it impossible to "break out" and execute code.

Rook 2010-05-30 22:23:57

manual sanitization is the root of all evil.

Longpoke 2010-05-30 22:56:30

But the `"` would become `"` after `htmlentities`. The reason why your example works is just because you use `'` instead of `"` in the HTML attribute value and MySQL string declaration. And the `'` is only replaced by `htmlentities` using the *ENT\_QUOTES* quote style: `$var = htmlentities($var, ENT_QUOTES);` would fix this too. The MySQL statement would result in `select * from mysql.user where Host='1' or 1=1 /*'` and the HTML example in `<body background='http://localhost/image.php?' onload=alert(/xss/)' >`.

Gumbo 2010-05-30 23:17:59

No, mysql_real_escape_string() alone is *not* best to stop SQL injection. The way to get *absolute* protection is by using parametrized queries, which send the commands and the data to the database engine via separate channels, making SQL injection *impossible*. Do not escape queries. Do not sanitize queries. Use prepared statements and placeholders. (As, incidentally, the OP said he is doing.)

Dave Sherohman 2010-05-30 23:57:55

@Dave Sherohman Both ADODB and PDO use mysql_real_escape_string() to power their parameterized quires, and yes i highly recommend using parameterized quires and i also highly recommend reading someones post before commenting.

Rook 2010-05-31 00:07:18

@Gumbo right again, however don't give people the idea that htmlentities should be used to protect against sql injection, because it doesn't stop backslashes and that is as dangerous as quote marks. There are also language encoding issues that mysql_real_escape_stirng() will stop.

Rook 2010-05-31 00:12:13

1) ADODB/PDO use mysql_real_escape_string to emulate prepared statements *on versions of MySQL which lack support for parametrized queries*. If your db supports the real thing, ADODB/PDO will use them, *not* escaping. 2) If I hadn't read your post, how would I have known that you claimed, and I quote, "mysql_real_escape_string() alone is best to stop sql injection"? It is *not* the best way. Parametrized queries are more effective and don't mangle your data - unless your db doesn't support them, in which case the driver falls back to escaping as the next-best thing.

Dave Sherohman 2010-05-31 07:27:28

Answer 7

A:

I wonder about the concept of sanitization. You're telling Mysql to do exactly what you want it to do: run a query statement authored in part by the website user. You're already constructing the sentence dynamically using user input - concatenating strings with data supplied by the user. You get what you ask for.

Anyway, here's some more sanitization methods...

1) For numeric values, always manually cast at least somewhere before or while you build the query string: "SELECT field1 FROM tblTest WHERE(id = ".(int) $val.")";

2) For dates, convert the variable to unix timestamp first. Then, use the Mysql FROM_UNIXTIME() function to convert it back to a date. "SELECT field1 FROM tblTest WHERE(date_field >= FROM_UNIXTIME(".strtotime($val).")";. This is actually needed sometimes anyway to deal with how Mysql interprets and stores dates different from the script or OS layers.

3) For short and predictable strings that must follow a certain standard (username, email, phone number, etc), you can a) do prepared statements; or b) regex or other data validation.

4) For strings which wouldn't follow any real standard and which may or may not have pre- or double-escaped and executable code all over the place (text, memos, wiki markup, links, etc), you can a) do prepared statements; or b) store to and convert from binary/blob form - converting each character to binary, hex, or decimal representation before even passing the value to the query string, and converting back when extracting. This way you can focus more on just html validation when you spit the stored value back out.

bob-the-destroyer 2010-06-01 00:50:13

-1: Don't concatenate or interpolate user-supplied data directly into your query strings. Period. Use parametrized queries (aka prepared statements/placeholders) and SQL injection will be a thing of the past. Note that OP stated "Also for sanitizing, I use prepared statements to prevent sql injections." As he is already using parametrized queries, none of these half-measure sanitization techniques are needed, so please do not present them as being an appropriate means of protecting against SQL injection.

Dave Sherohman 2010-06-01 09:18:20

@Dave. I admit I got carried away, but I disagree with you on the uselessness of my suggestions. Prepared statements are a way to prevent sql injection, yes. But not if a different db interface is later used, which could leave everyone sharing the data open either by exploitable holes or without realizing it, recommitting the sql-injection laden text which was previously thought safely stored. Furthermore, it lends nothing to ensuring database integrity, nor nullifying xss attacks, nor stripping html which could effectively redraw your entire site.

bob-the-destroyer 2010-06-01 20:38:35

<i>Furthermore, it lends nothing to ensuring database integrity</i>To clarify, you're apparently limited to marking a variable as "idsb". I do not know myself if this will force the type. Still this isn't sufficient sometimes for data validation and integrity. Some users do have problems with blobs even when adjusting max_allowed_packet to compensate. Some users find mysqli buggy. And, some hosts are strict with db connections and queries, so a simple one-time query spread out over two db calls using PS is sometimes wasteful. So, PS are still not the end-all, be-all solution.

bob-the-destroyer 2010-06-01 21:38:26

True, parametrized queries do nothing for ensuring that you have stored consistent data, XSS, or HTML-based attacks. And I never said they did. The sanitization techniques presented in your answer were framed primarily as methods for preventing SQL injection and I only said that parametrized queries are superior for that specific purpose. Other methods must be used to protect against other forms of attack and data sanitization is generally a part of those methods - but none of them require the interpolation of user data into SQL query strings.

Dave Sherohman 2010-06-01 23:20:37

ansaurus

tags:

views:

answers:

Are these two functions overkill for sanitization?

related questions