views:

615

answers:

3

I'm inserting some data into a database from a form. I'm using addslashes to escape the text (have also tried mysql_real_escape_string with the same result).

Regular quotes are escaped, but some other quotes are not. For example, the string:

Homer's blood becomes the secret ingredient in Moe’s new beer.

is converted to:

Homer\'s blood becomes the secret ingredient in Moe’s new beer.

I didn't think the curly quote would matter unescaped, but only this text is inserted into the database:

Homer's blood becomes the secret ingredient in Moe

So PHP thinks the curly quote is fine, but MySQL is losing the string. MySQL is not giving any errors though.

+4  A: 

I would look for a mismatch between the character encoding used in your Web interface and that used at the database level. If your Web interface uses UTF-8, for example, and your database is using the default MySQL encoding of latin1, then you need to set up your tables with DEFAULT CHARSET=utf8.

Use mysql_real_escape_string() or mysqli, by the way. addslashes() is NOT adequate protection against SQL injection.

chaos
+1 addslashes should never be used for anything. It will indeed be a character set problem; my guess given that the quote characters are actually disappearing is that they're ISO-8859-1 bytes being inserted into a UTF-8 database. Really you want to aim to have everything in UTF-8; start by serving your pages with that encoding, and that will ensure submitted forms also come in in UTF-8.
bobince
Yeah it was because the web page wasn't UTF8 but MySQL was. Side question though: is there an opposite function to `mysql_real_escape_string`? I couldn't find anything in the manual.
DisgruntledGoat
Nope. Probably because it's kind of hard to imagine why you'd need one. If you're pulling data back out of MySQL, hopefully it's obvious that you don't need to reverse the escaping. If for some reason you need the original data before you push it into MySQL, just don't get rid of the original data when you make your escaped version.
chaos
It was because I had a function that would recursively add/remove slashes (see http://stackoverflow.com/questions/1216552/php-pass-by-reference-in-recursive-function-not-working). The idea was to bring all form data to a consistent state after submitting, e.g. remove magic quotes, process the data if necessary, then add them back for a MySQL query. I think I'll soon be switching to parameterized queries though, it saves a lot of headache!!
DisgruntledGoat
Ahh, okay. IMO the only worthwhile thing to do with magic quotes is remove them, so you just need `stripslashes()` for that. :)
chaos
A: 

I haven't encountered this personally but is it possible this has something to do with the character set of the table? Is the character set UTF8 and, if not, are you able to make it UTF8? If it is and it still happens double check the column, you can set the character set per-column.

Steven Surowiec
+4  A: 

The ’ in Moe’s is the only character in your example string that wouldn't be valid if that string is latin1 encoded but your mysql server expects utf8.

Simple demonstration:

<?php
function foo($s) {
    echo 'len=', strlen($s), ' ';
  for($i=0; $i<strlen($s); $i++) {
    printf('%02X ', ord($s[$i]));
  }
  echo "\n";
}

 // my file is latin1 encoded and so is the string literal
foo('Moe’s');
// now try it with an utf8 encoded string
foo( utf8_encode('Moe’s') );

prints

len=5 4D 6F 65 92 73
len=6 4D 6F 65 C2 92 73

Therefore the question is: Do you feed the mysql server something in a "wrong" encoding?
Each connection has a connection charset and the mysql server expects your client (php script) to send data that is encoded in that character set. You can find out what the connection charset is with

SHOW VARIABLES LIKE '%character%'

like in

$mysql = mysql_connect('..', '..', '..') or die(mysql_error());
mysql_select_db('..', $mysql) or die(mysql_error());

$query = "SHOW VARIABLES like '%character%'";
$result = mysql_query($query, $mysql) or die(__LINE__.mysql_error());
while( false!==($row=mysql_fetch_array($result, MYSQL_ASSOC)) ) {
  echo join(', ', $row), "\n";
}

This should print something like

character_set_client, utf8
character_set_connection, utf8
character_set_database, latin1
character_set_filesystem, binary
character_set_results, utf8
character_set_server, utf8
character_set_system, utf8

and character_set_connection, utf8 indicates that "my" connection character set is utf8, i.e. the mysql server expects utf8 encoded characters from the client (php). What's "your" connection charset?

Then take a look at the actual encoding of your parameter string, i.e. if you had

$foo = mysql_real_escape_string($_POST['foo'], $mysql);

replace that by

echo '<div>Debug hex($_POST[foo])=';
for($i=0; $i<strlen($s); $i++) {
    printf('%02X ', ord($_POST['foo'][$i]));
}
echo "</div>\n";
$foo = mysql_real_escape_string($_POST['foo'], $mysql);

and check what the actual encoding of your input string is. Does it print 92 or C2 92?

VolkerK