views:

840

answers:

4

I have a simple textbox in a form and I want to safely store special characters in the database after POST or GET and I use the code below. $text=mysql_real_escape_string(htmlspecialchars_decode(stripslashes(trim($_GET["text"])),ENT_QUOTES));

When I read the text from the database and put it in the text value I use the code above.

$text=htmlspecialchars($text_from_DB,ENT_QUOTES,'UTF-8',false);
<input type="text" value="<?=$text?>" />

I am trying to save in the database with no special characters (meaning I don't want to write in database field " or ')

Actually when writing to the database do htmlspecialchars_decode to the text.

When writing to the form text box do htmlspecialchars to the text.

Is this the best approach for safe writing special chars to the database?

A: 

When you write to db, use htmlentities but when you read back, use html_entity_decode function.

As a sidenote, if you are looking for some security, then for strings use mysql_real_escape_string and for numbers use intval.

Sarfraz
A: 

The best approach to safe write to a DB is to use the PDO abstraction layer and make use of prepared statements.

http://www.php.net/manual/en/intro.pdo.php

A good tutorial (I learned from this one) is

http://www.phpro.org/tutorials/Introduction-to-PHP-PDO.html

However, you might have to rewrite alot of your site just to implement this. But this is no doubt the most elegant method than having to make use of all those functions. Plus, prepared statements are becoming the de facto now. Another benefit of this is that you do not have to rewrite your queries if you switch to a different database (such as from MySQL to PostgreSQL). But I would say consider this if you plan to scale your site.

Axsuul
i am familiar with PDO but our server does not support it.
ntan
+2  A: 

You have the right idea of keeping the text in the database as raw. Not sure what all the HTML entity stuff is for; you shouldn't need to be doing that for a database insertion.

[The only reason I can think of why you might try to entity-decode incoming input for the database would be if you find you are getting character references like &#352; in your form submission input. If that's happening, it's because the user is inputting characters that don't exist in the encoding used by the page with the form. This form of encoding is totally bogus because you then can't distinguish between the user typing Š and literally typing &#352;! You should avoid this by using the UTF-8 encoding for all your pages and content, as every possible character fits in this encoding.]

Strings in your script should always be raw text with no escaping. That means you don't do anything to them until the time you output them into a context that isn't plain-text. So for putting them into an SQL string:

$category= trim($_POST['category']);
mysql_query("SELECT * FROM things WHERE category='".mysql_real_escape_string($category)."'");

(or use parameterised queries to avoid having to manually escape it.) When putting content into HTML:

<input type="text" name="category" value="<?php echo htmlspecialchars($category); ?>" />

(you can define a helper function with a shorter name like function h($s) { echo htmlspecialchars($s, ENT_QUOTES); } if you want to cut down on the amount of typing you have to do in templates.)

And... that's pretty much it. You don't need to process strings that come out of the database, as they're already raw strings. You don't need to process input strings(*), other than any application-specific field validation you want to do.

*: well, except if magic_quotes_gpc is turned on, in which case you do either need to stripslashes() everything that comes in from get/post/cookie, or, my favoured option, just immediately fail:

if (get_magic_quotes_gpc())
    die(
        'Magic quotes are turned on. They are utterly bogus and no-one should use them. '.
        'Turn them off, you idiot, or I refuse to run. So there!'
    );
bobince
i think the category example y wrote above the first time the user write test's categories will be written in database as raw.When edit the category (because htmlspecialchars($category)) will be written as test's categories, but i prefer written as raw.Correct me if worng.Thanks for your reply
ntan
`htmlspecialchars($category)` will be *written* to the HTML as `test's categories`, yes. But that's just an encoding for the browser to read; the actual value of the resulting form field in the DOM is `test's categories`, and it is this unencoded string you will get back in your `$_POST['category']` when the form is submitted.
bobince
So when i want to do a search over the tables should i convert user input with htmlspecialchars or not.I mean the policy of writting to DB should be ONE.I can not write 10 records whith raw format and 10 other with htmlspecialchars.I prefer raw format
ntan
Indeed. Raw is always best. Having HTML-encoded content in the database can mess up indexing, comparisons, truncations and other string processing.
bobince
So for writing to DB mysql_real_escape_string(htmlspecialchars_decode(stripslashes(trim($_GET["text"])),ENT_QUOTES)); i use this trim for spaces , stripslashes if any ex magic_quotes_gpc=on, htmlspecialchars_decode for decode the chars from the input, and mysql_real_escape_string for escaping and security.I think that this is a good approach, dont know if best
ntan
You shouldn't use `stripslashes` there. If `magic_quotes` is on (and today it should never, never be, which is why I recommended immediately quitting if you discovered it turned on) then you need to do the `$_GET`/`$_POST` unescaping right at the start of the script, not at database insertion time. `htmlspecialchars_decode` has no place here; I have no idea why you are including it.
bobince
htmlspecialchars_decode is there to decode the input field. IF i am editing a text field is test's categories so decode it to write in raw format and not with '.I agree with others and for the record the line of code is to demonstrate the process, of course i dont write ib DB that way (array_map the process)
ntan
**It is already in raw format** when you get it out of `$_GET`. If you have `'` in a form submission, that's because the user deliberately typed ampersand-hash-three-nine-semicolon into the field and you should leave it like that.
bobince
I have ' because <input type="text" name="category" value="<?php echo htmlspecialchars($category); ?>" /> no because the user deliberately typed.Correct me if wrong
ntan
You really don't. Try it. Put `<input name="x" value="'" />` in a form. Notice how it displays an apostrophe in the field in the browser, and sends an apostrophe back to PHP when you submit it. HTML-escaping is only a way to get out-of-band characters into the browser's Document Object Model. Once they are there, the escapes are never seen again.
bobince
+1  A: 

I'd like to point out a couple of things:

  1. there is nothing wrong in saving characters like ' and " in a database, SQL injections are just a matter of string manipulation, they actually have nothing to do with SQL or databases -- the problem only relies in how the query string is built. If you want to write your own queries (not recommended) you don't have to encode every apostrophe or double quote: just escape them once to build a safe string, and save them in the database. A better approach is using PDO as mentioned, or using the mysqli extension which allows queries with prepared statements

  2. htmlentities() and similar functions should be used when sending data as output to the browser, not for encoding data to be stored in a database for at least two reasons: first of all it's useless, the DB doesn't care about html entities, it just contains data; secondly you should always treat data coming from the database as potentially insecure, so you should save it in "raw" format and encode it when using it.

kemp
Totally agree with you but our server does not support PDO
ntan
That's why I mentioned `mysqli` too. The rest of the points still stand though.
kemp