views:

53

answers:

1

Hi All, On an older server I'm using that I can't use prepared statements on I am currently trying to fully escape user input before sending it to MySQL. For this I am using the PHP function mysql_real_escape_string.

Since this function does not escape the MySQL wildcards % and _ I am using addcslashes to escape these as well.

When I send something like:

test_test " ' 

to the database and then read it back the database shows:

test\_test " ' 

Looking at this I can't understand why the _ has a preceding backslash but the " and ' don't. Since they are all escaped with \ surely _ ' and " should all appear the same, i.e. all have the escape character visible or all not have it visible.

Are the escaping \s automatically screened out for

Can anyone explain this?

+3  A: 

_ and % are not wildcards in MySQL in general, and should not be escaped for the purposes of putting them into normal string literals. mysql_real_escape_string is correct and sufficient for this purpose. addcslashes should not be used.

_ and % are special solely in the context of LIKE-matching. When you want to prepare strings for literal use in a LIKE statement, so that 100% matches one-hundred-percent and not just any string starting with a hundred, you have two levels of escaping to worry about.

The first is LIKE escaping. LIKE handling takes place entirely inside SQL, and if you want to turn a literal string into an literal LIKE expression you must perform this step even if you are using parameterised queries!

In this scheme, _ and % are special and must be escaped. The escape character must also be escaped. According to ANSI SQL, characters other than these must not be escaped: \' would be wrong. (Though MySQL will typically let you get away with it.)

Having done this, you proceed to the second level of escaping, which is plain old string literal escaping. This takes place outside of SQL, creating SQL, so must be done after the LIKE escaping step. For MySQL, this is mysql_real_escape_string as before; for other databases there will be a different function, of you can just use parameterised queries to avoid having to do it.

The problem that leads to confusion here is that in MySQL uses a backslash as an escape character for both of the nested escaping steps! So if you wanted to match a string against a literal percent sign you would have to double-backslash-escape and say LIKE 'something\\%'. Or, if that's in a PHP " literal which also uses backslash escaping, "LIKE 'something\\\\%'". Argh!

This is incorrect according to ANSI SQL, which says that: in string literals backslashes mean literal backslashes and the way to escape a single quote is ''; in LIKE expressions there is no escape character at all by default.

So if you want to LIKE-escape in a portable way, you should override the default (wrong) behaviour and specify your own escape character, using the LIKE ... ESCAPE ... construct. For sanity, we'll choose something other than the damn backslash!

function like($s, $e) {
    return str_replace(array($e, '_', '%'), array("$e$e", "$e_", "$e%"), $s);
}

$escapedname= mysql_real_escape_string(like($name, '='));
$query= "... WHERE name LIKE '%$escapedname%' ESCAPE '=' AND ...";

or with parameters (eg. in PDO):

$q= $db->prepare("... WHERE name LIKE ? ESCAPE '=' AND ...");
$q->bindValue(1, '%'.like($name, '=').'%', PDO::PARAM_STR);

(If you want more portability party time, you can also have fun trying to account for MS SQL Server and Sybase, where the [ character is also, incorrectly, special in a LIKE statement and has to be escaped. argh.)

bobince
I would +1 again for "the damn backslash!".
BoltClock
Thanks, just absorbing this now...this is really helping me expand my basic knowledge. Stupidly, I was escaping % and _ even though I'm not actually using any LIKE statements and since I think (please confirm) that % and _ are only wild in the context of a LIKE statement, I am in fact wasting my time. But then that makes me think why would you ever want to escape a % or _ when it's in the context of a LIKE statement. Surely the only reason to use a LIKE statement is so you can use it's wild characters. (please excuse my limited knowledge on this)
Columbo
Sure, but it's perfectly natural to want to be able to search for a literal `%` or `_` character. If a user searches for `50%` in the front end, they probably mean they're looking for a string containing `50%` and not just any string with `50` in it.
bobince
Sorry, yes, of course, I see now. I will keep a bookmark on this post. Thanks for your help.
Columbo