views:

1173

answers:

7

PHP uses "magic quotes" by default but has gotten a lot of flak for it. I understand it will disable it in next major version of PHP.

While the arguments against it makes sense, what I don't get it is why not just use the HTML entities to represent quotes instead of stripping and removing slashes? After all, a VAST majority of mySQL is used for outputting to web browsers?

For example, ' is used instead of ' and it won't affect the database at all.

Another question, why can't PHP just have configurations set up for each version of PHP with this tag <?php4 or <?php5 so appropriate interpreters can be loaded for those versions?

Just curious. :)

+2  A: 

I will reply to your first question only.

Validation of input is a wrong approach anyway, because it's not input that matters, the problem is where it's used. PHP can't assume that all input to a MySQL query would be output to a context where a HTML Entity would make sense.

It's nice to see that magic_quotes is going; it's the cause of a lot of security issues with PHP, and it's nice to see them taking a new approach :)

You'll do yourself a big favour if you reframe your validation approaches to validate on OUTPUT, for the context you are working in. Only you, as the programmer, can know this.

Noon Silk
+1  A: 

The reason that MySQL doesn't convert ' to &#039; is because &#039; is not '. If you want to convert your data for output, you should be doing that at the view layer, not in your database. It's really not very hard to just call htmlentities before/when you echo.

nickf
+9  A: 

Putting &#039; into a string column in a database would be fine, if all you use database content for is outputting to a web page. But that's not true.

It's better to escape output at the time you output it. That's the only time you know for sure that the output is going to a web page -- not a log file, an email, or other destination.

PS: PHP already turns magic quotes off by default in the standard php.ini file. It's deprecated in PHP 5.3, and it will be removed from the language entirely in PHP 6.0.

Bill Karwin
However, since I wrote this, PHP 6.0 has been totally scrapped.
Bill Karwin
A: 

Thanks everyone. I had to REALLY think what you meant and the implications it may have if I change the quotes to HTML entities instead of adding slashes to them but again, isn't that actually changing the output/input too?

I cannot think of a reason why we CANNOT or SHOULDN'T use HTML entities for mySQL as long as we make it clear that all data is encoded using HTML entities. After all, my argument is based on a fact that the majority of mySQL is used for outputting to HTML browsers and also the fact that ' and " and / can seriously harm mySQL databases. So, isn't it actually SAFER to encode ' and " and / as HTML entities before sending them as INSERT queries? Also, we're going XML so why waste time writing htmlentities and stripslashes and addslashes when accessing data that's ALREADY encoded in HTML entities?

netrox
Although the other answerers make a valid point about separation of concerns, your approach is workable. As long as you (and everyone else) understands the data in the database is encoded in this way, everything will be fine. Sure, if you want to send a text-email of the data, you just convert the HTML entities out. As long as it's understood, OK.The downside is MySQL doesn't understand data to be encoded like this. It's collating, searching and string functions assume a certain encoding, like UTF8 or latin1. They won't behave as you'd expect when there are entities. Is that OK?
ndp
Yeah, the important thing is to be *consistent*. As long as you know how you're going to use the data, and plan to un-htmlentities the strings when you don't present it as HTML or XML, then do as you please.
Bill Karwin
The point is you don't encode on the way IN, you encode on the way OUT. You need to understand this, and once you do, you will write much more secure apps.
Noon Silk
Oh yes, thank you all! :) I didn't think about the possible side effects of mySQL functions misinterpreting data when HTML entities are used!
netrox
+1  A: 

Here's a good reason, mostly in response to your own posted answer: Using htmlspecialchars() or htmlentities() does not make your SQL query safe. That's what mysql_real_escape_string() is for.

You seem to be making the assumption that it's only the single and double quote characters that pose a problem. MySQL queries are actually vulnerable to the \x00, \n, \r, \, ', " and \x1a characters in your data. If you are not using prepared statements or mysql_real_escape_string(), then you have an SQL injection vulnerability.

htmlspecialchars() and htmlentities() do not convert all of these characters, ergo you cannot make your query safe by using these functions. To that end, addslashes() does not make your query safe either!

Other smaller downsides include what the other posters have already mentioned about MySQL not always being used for web content, as well as the fact that you are increasing the amount of storage and index space needed for your data (consider one byte of storage for a quote character, versus six or more bytes of storage for its entity form).

zombat
Thanks! I totally UNDERSTAND now! :) I will use mysql_real_escape_string() function from now on.
netrox
A: 

You can't just convert ' to &#039;. Think about it: what happens when you want to store the string "&#039;"? If you store &#039; then when you load the page it will display ' and not &#039;.

So now you have to convert ALL HTML entities, not just quotes. Then you start getting into all sorts of weird conversion problems. The simplest solution is to just store the real data in the database, then you can display it how you like. You might want to use the actual quotes - in most cases " and ' don't do any harm outside of the tag brackets.

Sometimes you may want to store actual HTML in a field and display it raw (as long as it's checked and sanitized on its way in/out.

DisgruntledGoat
A: 

I understand that this is an older thread, but maybe somebody could clarify this for me.

If I use mysql_real_escape_string and the user input contains a single quote, the database will store that as "\'", ie. adding the slash to the single quote. Obviously, I can remove that slash when I output this data with stripslashes. However, it has been said I should store the real data as is, ie. the single quote without the slash. My colleague tells me Oracle would work that way, but I cannot get it to work in MySQL.

Question: should a single quote be stored as single quote in the database or as single quote with a backslash in front? If no slash should be there, how do I achieve this?

Currently, I take my user input and run stripslashes (to get rid of extra slashes) on it and then mysql_real_escape_string $value=mysql_real_escape_string(stripslashes(trim($_POST[userinput])));

When I output, I again run stripslashes.

Thanks, Jens

Jens
`mysql_real_escape_string()` doesn't cause data to be stored with slashes, it simply uses slashes to escape the data on the way into the database. So there's no need to `stripslashes` on the way out. By excessively stripping slashes you could be altering data that is intended to have them. Only use stripslashes when a) you've added slashes yourself. b) data has been escaped using a different process and now contains slashes c) `magic_quotes_gpc` is switched on in your php config file.