The concept of cleaning input never made much sense to me. It's based on the assumption that some kinds of input are dangerous, but in reality there is no such thing as dangerous input; Just code that handles input wrongly.
The culprit of it is that if you embed a variable inside some kind of string (code), which is then evaluated by any kind of interpreter, you must ensure that the variable is properly escaped. For example, if you embed a string in a SQL-statement, then you must quote and escape certain characters in this string. If you embed values in a URL, then you must escape it with urlencode
. If you embed a string within a HTML document, then you must escape with htmlspecialchars
. And so on and so forth.
Trying to "clean" data up front is a doomed strategy, because you can't know - at that point - which context the data is going to be used in. The infamous magic_quotes anti-feature of PHP, is a prime example of this misguided idea.