%3Cscript%3Ealert%28123%29%3C%2Fscript%3E
is the URL-encoded form of <script>alert(123);</script>
. Any time you include <
in a form value, it will be submitted to the server as %3C
. PHP will read and decode that back to <
before anything in your application gets a look at it.
That is to say, there is no special encoding that you have to handle; you won't actually see %3C
in your input, you see <
. If you're failing to encode that for on-page display then you don't have even the most basic defenses against XSS.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
I'm afraid you have not fixed your XSS problems at all. You may have merely obfuscated them.
Input filtering is a depressingly common but quite wrong strategy for blocking XSS.
It is not the input that's the problem. As your boss says, there is no reason you shouldn't be able to input O'Brien
. Or even <script>
, like I am just now in this comment box. You should not attempt to strip tags in the input or even HTML-encode them, because who knows at input-time that the data is going to end up in an HTML page? You don't want your database filled with nonsense like 'Fish&Chips'
which then ends up in an e-mail or other non-HTML context with weird HTML escapes in it.
HTML-encoding is an output-stage issue. Leave the incoming strings alone, keep them as raw strings in the database (of course, if you are hacking together queries in strings to put the data in the database instead of parameterised queries, you would need to SQL-escape the content at exactly that point). Then only when you are inserting the values in HTML, encode them:
Name: <?php echo htmlspecialchars($row['name']); ?>
If you have a load of dodgy code like echo "Name: $name";
then I'm afraid you have much rewriting to do to make it secure.
Hint: consider defining a function with a short name like h
so you don't have to type htmlspecialchars
so much. Don't use htmlentities
which will usually-unnecessarily encode non-ASCII characters, which will also mess them up unless you supply a correct $charset
argument.
(Or, if you are using Zend_View, $this->escape()
.)
Input validation is useful on an application-specific level, for things like ensuring telephone number fields contain numbers and not letters. It is not something you can apply globally to avoid having to think about the issues that arise when you put a string inside the context of another string—whether that's inside HTML, SQL, JavaScript string literals or one of the many other contexts that require escaping.