views:

318

answers:

8

I am very new to PHP/programming, with that in mind I am trying to come up with a function that I can pass all my strings through to sanatize. So that the string that comes out of it will be safe for database insertion. But there are so many filtering functions out there I am not sure which ones I should use/need. Please help me fill in the blanks:

function filterThis($string) {
    $string = mysql_real_escape_string($string);
    $string = htmlentities($string);
    etc...
    return $string;
}
+1  A: 

It depends on the kind of data you are using. The general best one to use would be mysql_real_escape_string but, for example, you know there won't be HTML content, using strip_tags will add extra security.

You can also remove characters you know shouldn't be allowed.

Aaron Harun
A: 

All you really need to do to make a string safe for database storing is to pass it through mysql_real_escape_string().

Here is a nice tutorial about the PHP string filtering functions: http://www.w3schools.com/php/php_filter.asp

Also this is unrelated to the question but this would be better convention for your code:

function filterThis($string) {
    return mysql_real_escape_string(htmlentities($string));
}
Gnarly
I really don't understand why you're at -1 right now. Technically you're pretty much correct.
Joe Philllips
weird function.
Col. Shrapnel
W3Schools is generally awful, when it comes to PHP, it is positively dangerous. Also, generally speaking, you shouldn't be taking input, converting it to HTML, then storing it in the database. The original input should be stored, and any HTML conversion should take place just before you insert it into an HTML document.
David Dorward
@David Dorward He's not condoning the use of htmlentities there... he's just showing a better way to write the version of the ops function (which is inherently wrong in the first place). It seems clear to me when he says "...but this would be better convention for your code"
Joe Philllips
-1 That's a pretty weak answer for someone who claims 4 years of programming experience.
NullUserException
@NullUserException - In case you haven't realized, PHP isn't the only programming language on the planet.
Gnarly
A: 

You use mysql_real_escape_string() in code like

$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
  mysql_real_escape_string($user),
  mysql_real_escape_string($password)
);

htmlentities() is used to convert some characters in entities, when you output a string in a HTML content.

kiamlaluno
+5  A: 

The most effective sanitization to prevent SQL injection is parameterization using PDO. Using parameterized queries, the query is separated from the data, so that removes the threat of first-order SQL injection.

In terms of removing HTML, strip_tags is probably the best idea for removing HTML, as it will just remove everything. htmlentities does what it sounds like, so that works, too. If you need to parse which HTML to permit (that is, you want to allow some tags), you should use an mature existing parser such as HTML Purifier

Zurahn
+1 for PDO and HTML Purifier
ggfan
Aw man, I wrote up that giant wall of text just because I didn't see anyone mention HTML Purifier, and here you beat me by like 40 minutes. ;)
Charles
Shouldn't you only strip HTML on output? IMO you should never change input data - you never know when you'll need it
Joe Philllips
+1  A: 

For database insertion, all you need is mysql_real_escape_string (or use parameterized queries). You generally don't want to alter data before saving it, which is what would happen if you used htmlentities. That would lead to a garbled mess later on when you ran it through htmlentities again to display it somewhere on a webpage.

Use htmlentities when you are displaying the data on a webpage somewhere.

Somewhat related, if you are sending submitted data somewhere in an email, like with a contact form for instance, be sure to strip newlines from any data that will be used in the header (like the From: name and email address, subect, etc)

$input = preg_replace('/\s+/', ' ', $input);

If you don't do this it's just a matter of time before the spam bots find your form and abuse it, I've learned the hard way.

Rob
+17  A: 

Stop!

You're making a mistake here. Oh, no, you've picked the right PHP function calls to make data a bit safer, that's fine. Your mistake is one of order of operations and the intent of the functions.

When users submit data, you need to make sure that the data is the form you expect. If you expect something to be a number, make sure it's a number. If it needs to be an integer between 1 and 10, make sure it's really an integer between 1 and 10. If it appeared in a drop-down menu, make sure that the submitted value would have appeared in the menu. Same for radio buttons. If the field shouldn't have HTML in it, make sure to remove HTML or neutralize it. If the field should have HTML in it, make sure only the parts of HTML that you like are included.

Check out the featureful but weird feeling built-in filter functions, and learn to love regexes.

This is data validation. Validation is not the same thing as sanitization. You need both!

When you insert data into the database, that's when you need to sanitize. Every single database API does it differently. You've already discovered mysql_real_escape_string, but that's not a good thing. The "mysql" API lacks a feature called prepared staements. Prepared statements let you use placeholders in your query:

SELECT ... FROM ... WHERE fieldname = ? -- That question mark is a placeholder

When working with placeholders, the very act of filling in the placeholder automatically sanitizes the data for you. If you're working with MySQL, check out the "mysql*i*" extension (prepare, bind, execute) and the PDO extension (prepare, bind, execute). Learning PDO will be worth it if you expect to work with other database types in the future, like SQLite or Postgres.

Yes, it's a bit more work than putting the string together yourself, but the end result is code you know is more secure against SQL injection.

When displaying data to the user, you need to make sure that malicious bits haven't found their way in. Unless you know that the data is completely safe and sane (numbers pulled from a database, for example), you should run pretty much everything through htmlspecialchars, unless you know it contains only safe or pre-sanitized HTML.

Overall, you need to remember to use the right type of data filtering on the right data at the right time. Don't run database escaping code on variables that will never see the database. HTML filtering is entirely unnecessary if the field shouldn't contain HTML, but must contain a value you can validate, like a number or something from a select menu.


Addendum: Others recommend htmlentities instead of htmlspecialchars. htmlentities turns HTML characters into entities, and then goes one step further and also turns things like accented characters into entities. This might not be what you want. In fact, if you've set up your character sets correctly, you probably don't need to do that. Keep this in mind.

Charles
Make sure to specify the character encoding when using htmlentities
Joe Philllips
And when specifying it, be sure it's on the list of supported encodings.
Charles
And do not use htmlentities at all, replace it with htmlspecialchars in purpose of replacing just `<>`, not every character to it's entity
Col. Shrapnel
Just be sure to not call `htmlspecialchars` twice, because he speaks of it in the "When users submit data part" and in the "When displaying the data" part.
Savageman
+2  A: 

Database Input - How to prevent SQL Injection

  1. Check to make sure data of type integer, for example, is valid by ensuring it actually is an integer
    • In the case of non-strings you need to ensure that the data actually is the correct type
    • In the case of strings you need to make sure the string is surrounded by quotes in the query (obviously, otherwise it wouldn't even work)
  2. Enter the value into the database while avoiding SQL injection (mysql_real_escape_string or parameterized queries)
  3. When Retrieving the value from the database be sure to avoid Cross Site Scripting attacks by making sure HTML can't be injected into the page (htmlspecialchars)

You need to escape user input before inserting or updating it into the database. Here is an older way to do it. You would want to use parameterized queries now (probably from the PDO class).

$mysql['username'] = mysql_real_escape_string($clean['username']);
$sql = "SELECT * FROM userlist WHERE username = '{$mysql['username']}'";
$result = mysql_query($sql);

Output from database - How to prevent XSS (Cross Site Scripting)

Use htmlspecialchars() only when outputting data from the database. The same applies for HTML Purifier. Example:

$html['username'] = htmlspecialchars($clean['username'])

And Finally... what you requested

I must point out that if you use PDO objects with parameterized queries (the proper way to do it) then there really is no easy way to achieve this easily. But if you use the old 'mysql' way then this is what you would need.

function filterThis($string) {
    return mysql_real_escape_string($string);
}
Joe Philllips
well well, not even upvote or thanks for the info you took from my answer :)
Col. Shrapnel
@Col I'm not going to upvote an unclear/unhelpful answer, sorry. Feel free to add to it to make it more useful and I'll happily upvote it.
Joe Philllips
laff i am not begging for an upvote, as you probably think. I am talking of honesty and gratitude. But I am not forcing you too. Just curios at ppl around
Col. Shrapnel
A: 

My 5 cents.

Nobody here understands the way mysql_real_escape_string works. This function do not filter or "sanitize" anything.
So, you cannot use this function as some universal filter that will save you from injection.
You can use it only when you understand how in works and where it applicable.

I have the answer to the very similar question I wrote already: http://stackoverflow.com/questions/2993027/in-php-when-submitting-strings-to-the-db-should-i-take-care-of-illegal-characters/2995163#2995163
Please click for the full explanation for the database side safety.

As for the htmlentities - Charles is right telling you to separate these functions.
Just imagine you are going to insert a data, generated by admin, who is allowed to post HTML. your function will spoil it.

Though I'd advise against htmlentities. This function become obsoleted long time ago. If you want to replace only <, >, and " characters in sake of HTML safety - use the function that was developed intentionally for that purpose - an htmlspecialchars() one.

Col. Shrapnel
`mysql_real_escape_string` escapes needed characters inside a string. It's not strictly filtering or sanitizing, but enclosing a string in quotes neither is (and everybody does it, I pretty much never saw a question about it).So nothing is sanitized when we write SQL? Of course not. What prevents the SQL injection is the use of `mysql_real_escape_string`. Also the enclosing quotes, but everybody does it, and if you test what you do, you end up with a SQL syntax error with this omission. The real dangerous part is handled with `mysql_real_escape_string`.
Savageman
@Savageman sorry pal, you understand not a thing. You do not understand the way mysql_real_escape_string works. These "needed characters" ARE quotes. Not this function nor quotes alone sanitizes anything. These 2 things works **together** only. Making query string just syntactically correct, not "safe from injection". And what syntax error I would get for just `WHERE id = 1`? ;)
Col. Shrapnel
That's what I call usual PHP'ish ignorance... That saddens me
Col. Shrapnel
Try `WHERE my_field = two words` (without quotes) to get the syntax error. Your example is bad because it doesn't need quotes neither escaping, just a numeric check.Also I didn't say the quotes were useless. I said everyone use them so this is not the source of problems regarding SQL injection.
Savageman
@Savageman so, that I said: *You can use it only when you understand how it works and where it applicable.* You have just admitted that mysql_real_escape_string is not applicable everywhere. As for `everyone use them` you can check codes here on SO. Many people do not use quotes with numbers. Go figure. An please, bear in mind that I am not discuss here what you have said and wht you don't. I am just explain basic database safety rules. You'd better learn instead of empty argue. *Nobody* mentioned quotes or casting here but m_r_e_s only as though it's magic. What I am talking about
Col. Shrapnel
"For database insertion, **all you need is mysql_real_escape_string"** from below is an example. I have just said it is not true. And you'd be a fool arguing that.
Col. Shrapnel
Yeah, you're right, but I think you make a big deal for something quite small. Quotes won't be forgotten, as they rise big problems (such as syntax error) very quickly. They won't go un-noticed, unlike `mysql_real_escape_string()`. If you forget `mysql_real_escape_string()`, you probably won't ever find a problem until you're attacked.
Savageman
@Savageman did you ever notice an example I have provided for you, **WHERE my_field = 234** one? **What syntax error it will cause?**
Col. Shrapnel
Yes. Please read again my post. I didn't say it ALWAYS throw an error.
Savageman
@Savageman You have said `Quotes won't be forgotten.` I've got you an example where it can be. Any questions?
Col. Shrapnel
Ahah. You're playing me. :DEither your field is INT and quotes are not needed (assuming you ensure the var is a numeric value), either it's not and missing quotes will be discovered during the testing.
Savageman
`(assuming you ensure the var is a numeric value)` - that's the problem. None mentioned it, neither you. You assume it is obvious but it is **not.** And it is not making a big deal of small. It's source of zillions of injections. I am not fucking playing. I am explaining you very basic things and god, I how am tired of that.
Col. Shrapnel
@Savageman Well, to put it to end. You have said that `The real dangerous part is handled with mysql_real_escape_string.` It is not true. Because The real dangerous part is handled with **mysql_real_escape_string combined with quotes only.** You have said `Everyone uses quotes.` It is not true and proved by an example. There are shitload of codes in the world where numbers not cast not quoted|escaped. It's a fact. So, you have nothing to agrue in my post in real. Now get lost.
Col. Shrapnel
Rest in peace. Amen. (I mistakenly voted up your comment, but that's not a big deal).Ensuring a value is numeric is not really related to the quote|escape problem. It's something different and we should not have one global way to treat different types of variables.
Savageman
@Savageman all this is already precisely explained in the post i've linked to. The WHOLE FUCKING SET of sanitization rules. And `Ensuring a value is numeric` is related to this. To sanitization, of which was the **opening post,** if you didn't notice that. And I am still do not understand what is making you so uneasy and disagree and what is the point of all your comments.
Col. Shrapnel