Is there a catchall function somewhere that works well for sanitizing user input for sql injection and XSS attacks, while still allowing certain types of html tags?
Regular Expressions: http://us2.php.net/manual/en/regex.examples.php
No. You can't generically filter data without any context of what it's for. Sometimes you'd want to take a SQL query as input and sometimes you'd want to take HTML as input.
You need to filter input on a whitelist -- ensure that the data matches some specification of what you're expect. Then you need to escape it before you use it, depending on the context in which you are using it.
The process of escaping data for SQL - to prevent SQL injection - is very different from the process of escaping data for (X)HTML, to prevent XSS.
There is the filter extension (howto-link, manual), which works pretty well with all GPC variables. It's not a magic-do-it-all thing though, you will still have to use it.
To address the XSS issue, take a look at HTML Purifier. It is fairly configurable and has a decent track record.
As for the SQL injection attacks, make sure you check the user input, and then run it though mysql_real_escape_string(). The function won't defeat all injection attacks, though, so it is important that you check the data before dumping it into your query string.
A better solution is to use prepared statements. The PDO library and mysqli extension support these.
No, there is not.
First of all, SQL injection is an input filtering problem, and XSS is an output escaping one - so you wouldn't even execute these two operations at the same time in the code lifecycle.
Basic rules of thumb
- For SQL query, bind parameters (as with PDO) or use a driver-native escaping function for query variables (such as
mysql_real_escape_string()
) - Use
strip_tags()
to filter out unwanted HTML - Escape all other output with
htmlspecialchars()
and be mindful of the 2nd and 3rd parameters here.
It's a common misconception that user input can be filtered. PHP even has a (now deprecated) "feature", called magic-quotes, that builds on this idea. It's nonsense. Forget about filtering (Or cleaning, or whatever people call it).
What you should do, to avoid problems is quite simple: Whenever you embed a string within foreign code, you must escape it, according to the rules of that language. For example, if you embed a string in some SQL targeting MySql, you must escape the string with MySql's function for this purpose (mysql_real_escape_string
).
Another example is HTML; If you embed strings within HTML markup, you must escape it with htmlspecialchars
. This means that every single echo
or print
statement should use htmlspecialchars
.
A third example could be shell commands; If you are going to embed strings (Such as arguments) to external commands, and call them with exec
, then you must use escapeshellcmd
and escapeshellarg
.
And so on and so forth ...
The only case where you need to actively filter data, is if you're accepting preformatted input. Eg. if you let your users post HTML markup, that you plan to display on the site. However, you should be wise to avoid this at all cost, since no matter how well you filter it, it will always be a potential security hole.
Do not try to prevent SQL injection by sanitizing input data.
Instead, do not allow data to be used in creating your SQL code. Use parameterized SQL that uses bound variables. It is the only way to be guaranteed against SQL injection.
Please see my website http://bobby-tables.com/ for more about preventing SQL injection.
One trick that can help in the specific circumstance where you have a page like /mypage?id=53
and you use the id in a WHERE clause is to ensure that id definitely is an integer, like so:
if (isset($_GET['id'])) {
$id = $_GET['id'];
settype($id, 'integer');
$result = mysql_query("SELECT * FROM mytable WHERE id = '$id'");
# now use the result
}
But of course that only cuts out one specific attack, so read all the other answers. (And yes I know that the code above isn't great, but it shows the specific defence.)