tags:

views:

333

answers:

3

In PHP, what is a list of potentially harmful characters that can be used to break a PHP page? And, using regular expressions, how can I filter out the bad sequence of characters from all of my user input?

For example.. to check if a email is valid I would use the below line:

preg_match("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$",$email);

This above checks for the specific pattern of the email.

But, just like when you check if a email is Valid using a regular expression, how would I check if the input has any invalid character patterns using one expression for every input? I would like to place this line at the very top of my php page which accepts a $_GET or $_POST to prevent any hacker-like inputs from crashing the page.

Hope this makes sense. Thank you PG

A: 

There are much better ways to clean input. The built-in function strip_tags will be faster.

Devin Ceartas
+4  A: 

There is no "one and only" way of filtering input like you describe, since no input is inherently invalid or even necessarily malicious. It's entirely what you do with the input that matters.

For example, suppose you have some text in $_GET['field'] and you are about to compose a SQL query. You need to escape the value using mysql_real_escape_string() (for MySQL, of course) like so:

$sql = "INSERT INTO some_table (some_field) VALUES ('" . mysql_real_escape_string($_GET['field']) . "')";

This escaping is absolutely crucial to apply to input that you're using in a SQL query. Once it's applied as you see here, even malicious input from a hacker will have no ill effects on your database.

However, this function is both useless and outright wrong to use if you're including $_GET['field] in some HTML output from your page. In that case, the function htmlspecialchars() is useful. You might do something like:

echo "<p>Your comments were: " . htmlspecialchars($_GET['field']) . "</p>";

Both these examples are quite safe from "hacker-like inputs." You will not be inserting malicious data into your database or into your HTML. Yet, notice the two forms of escaping are completely different functions, each suited to its use.

By contrast, imagine if you tried to "validate" input for these two uses at the same time. You certainly couldn't allow < or > characters, since those could be part of a malicious HTML attack like Cross-Site Scripting. So, visitors who want to write "I think 1 < 3" would be stymied. Likewise, you couldn't allow quote marks for fear of malicious SQL injection attacks, so poor "Miles O'Brien" could never fill out your form!

Proper input escaping is very easy to do, as you use it in different contexts (it's often even easier than validating input!) yet the results are so much better.

VoteyDisciple
+1. Domain-specific escaping is the right way to deal with ‘special’ characters. ‘sanitization’ and ‘verification’ are not.
bobince
A: 

If you are worried about user input that will contain HTML characters and/or SQL injection types of attacks, look into the built-in PHP functions like htmlentities() and mysql_real_escape_string().

Please read the the docs for details: http://us2.php.net/manual/en/security.database.sql-injection.php

Eugene