tags:

views:

393

answers:

8

This is the function i currently use(from a php book i bought):

function escape($data) {
 return mysql_real_escape_string(trim($data), $this->linkid); 
}

But I feel like it could be safer. for example, maybe use htmlspecialchars. It always makes me paranoid. I've read that mysql_real_escape_string is bad and never to use it, but then i've also read it's the best way. Lots of confusion regarding data sanitizing when inserting them to the database.

So how do you do it? and what are the pros and cons of the way you do it.

+1  A: 

There is no universal answer. It should always depend on what the data is that you're storing.

  • Is it supposed to be a number? Then run it through is_numeric (or such)
  • Is it a string that's not allowed to contain HTML? Use htmlentities
  • etc.

Running all data through mysql_real_escape_string is a good idea. Of course this also depends on whether your code is using a DB library or PDO or something else.

For example, with PDO, instead of the mysql function, you would want to use $pdo->quote, or with Zend_Db's statements, nothing as it escapes things automatically for you.

Jani Hartikainen
I meant in a general way, for strings. Sorry, the question should've been more clear. "How do you escape/sanitize your strings before inserting them into the database"
lyrae
-1: This answer does not demonstrate a good understanding of the different TYPES of escaping and why it is SO important.
gahooa
+4  A: 

You're talking about two different types of escaping.

mysql_real_escape_string() escapes data so it'll be safe to send to MySQL.

htmlspecialchars() escapes data so it'll be safe to send to something that renders HTML.

Both work fine for their respective purposes, but parameterized queries via something like mysqli are quite a bit neater.

ceejayoz
Right, but htmlspecialchars would convert " into ". Wouldnt that pevent soem injection thoguh?
lyrae
"parameterized queries via something like mysqli are quite a bit neater" imho even better: http;//php.net/pdo
VolkerK
@VolkerK: Did not know about this. Nice. So far i have been using my own wrapper class for MySQL operations. Looks like I won't be needing it anymore. Trying out PDO and so far so good.
lyrae
A: 

Use SOAP? Har har.

(disclaimer: yes, this is a joke)

patros
A: 

Sanitaze your data only before you put it in a sensitive context, like:

  • part of SQL query
  • part of filename or path
  • part of a shell command
  • part of HTML output (or any other output, like CSV, XML, ATOM, etc, etc)

Don’t use one generic escape function, because you’ll then have the feeling that the data is safe — but it isn’t. It’s safety depends on the context. And clearly you cannot do all the escaping at once, undependent of all situations you can be using the data. So keep the raw data in database (and yes, use mysql_real_escape_string() or some kind of parameter binding, using PDO for example) and use specific escaping function when putting into context:

  • htmlspecialchars() when in HTML context
  • escape_shell_arg() and escape_shell_cmd() when in shell command context
  • etc, etc
Maciej Łebkowski
A: 

After having made sure the data was valid and/or well-formed (see Jani Hartikainen's comment), you really only need a call to PHP's built-in addslashes().

Chris
Doesn't that give some issues though? For example, i believe if tha data contains a /, php will add another / to it, making it //. I think i have seen this cause issues on some sites, where quotes will have a slash in front of them (ie. "Hi, I\'m doing ok")
lyrae
Don't use addslashes() to sanitize data in a query. There are ways to bypass addslashes(), which opens the door to SQL injection. mysql_real_escape_string() also supports other charsets with multibyte characters
Robbie Groenewoudt
Such as? (I'm not trolling, just trying to understand how addslashes() could be compromised, as I use it copiously)
Chris
+1  A: 

There actually is a "universal answer" for the metaproblem (safely storing user-provided data into a database) which is this: If you're not using bind parameters to avoid the whole injection issue to begin with, you're doing it wrong.

Cleaning data is a great idea, but the chance you'll miss something is high. So, whatever other methods you use (and Jani is right, it depends on the data), please don't neglect using bind variables.

Passed data should never hit a query without being bound.

Zenham
what are bind variables/paremeters?
lyrae
Binding variables is the practice of using a prepare statement with placeholders to reflect that data will be passed to the query, then doing an execute() through which the data is passed. The reason that this is a secure method of passing data is context; the database server has already parsed and compiled the query, saving places for the data which will come later. The data is then passed into the engine as value data, and is never parsed by the SQL tokenizer, meaning you've skipped the place where injection happens.
Zenham
Beyond that, if you're running a query multiple times, it saves having to tokenize the SQL statement multiple times, which is typically the slowest part of a query. An SQL statement once prepared can be executed repeatedly using different data values.
Zenham
+3  A: 

Lets have a quick review of WHY escaping is needed in different contexts:

If you are in a quote delimited string, you need to be able to escape the quotes. If you are in xml, then you need to separate "content" from "markup" If you are in SQL, you need to separate "commands" from "data" If you are on the command line, you need to separate "commands" from "data"

This is a really basic aspect of computing in general. Because the syntax that delimits data can occur IN THE DATA, there needs to be a way to differentiate the DATA from the SYNTAX, hence, escaping.

In web programming, the common escaping cases are: 1. Outputting text into HTML 2. Outputting data into HTML attributes 3. Outputting HTML into HTML 4. Inserting data into Javascript 5. Inserting data into SQL 6. Inserting data into a shell command

Each one has a different security implications if handled incorrectly. THIS IS REALLY IMPORTANT! Let's review this in the context of PHP:

  1. Text into HTML: htmlspecialchars(...)

  2. Data into HTML attributes htmlspecialchars(..., ENT_QUOTES)

  3. HTML into HTML Use a library such as HTMLPurifier to ENSURE that only valid tags are present.

  4. Data into Javascript I prefer json_encode. If you are placing it in an attribute, you still need to use #2, such as

  5. Inserting data into SQL Each driver has an escape() function of some sort. It is best. If you are running in a normal latin1 character set, addslashes(...) is suitable. *mysql_real_escape_string() is better.* Don't forget the quotes AROUND the addslashes() call:

    "INSERT INTO table1 SET field1 = '" . addslashes($data) . "'"

  6. Data on the command line escapeshellarg() and escapeshellcmd() -- read the manual

-- Take these to heart, and you will eliminate 95%* of common web security risks! (* a guess)

gahooa
A: 
  1. In general, use filter_var()

  2. In cases where only very specific formats or values are allowed it may be better to use regexes or in_array() of valid values.

  3. Remember that "input" means any source of input that you don't directly control.

  4. If the input is going into a query, use prepared statements (e.g., mysqli)

Clayton