I find it much better to escape the data at the time it is used, not on the way in. You might want to use that data in JSON, XML, Shell, MySQL, Curl, or HTML and each will have it's own way of escaping the data.
Lets have a quick review of WHY escaping is needed in different contexts:
If you are in a quote delimited string, you need to be able to escape the quotes.
If you are in xml, then you need to separate "content" from "markup"
If you are in SQL, you need to separate "commands" from "data"
If you are on the command line, you need to separate "commands" from "data"
This is a really basic aspect of computing in general. Because the syntax that delimits data can occur IN THE DATA, there needs to be a way to differentiate the DATA from the SYNTAX, hence, escaping.
In web programming, the common escaping cases are:
1. Outputting text into HTML
2. Outputting data into HTML attributes
3. Outputting HTML into HTML
4. Inserting data into Javascript
5. Inserting data into SQL
6. Inserting data into a shell command
Each one has a different security implications if handled incorrectly. THIS IS REALLY IMPORTANT! Let's review this in the context of PHP:
Text into HTML:
htmlspecialchars(...)
Data into HTML attributes
htmlspecialchars(..., ENT_QUOTES)
HTML into HTML
Use a library such as HTMLPurifier to ENSURE that only valid tags are present.
Data into Javascript
I prefer json_encode
. If you are placing it in an attribute, you still need to use #2, such as
Inserting data into SQL
Each driver has an escape() function of some sort. It is best. If you are running in a normal latin1 character set, addslashes(...) is suitable. Don't forget the quotes AROUND the addslashes() call:
"INSERT INTO table1 SET field1 = '" . addslashes($data) . "'"
Data on the command line
escapeshellarg() and escapeshellcmd() -- read the manual
--
Take these to heart, and you will eliminate 95%* of common web security risks! (* a guess)