I hate to break it out to you, but -
- XSS is an Output problem, not an Input problem. Filtering/Validating input is an additional layer of defence, but it can never protect you completely from XSS. Take a look at XSS cheatsheet by RSnake - there's just too many ways to escape a filter.
- There is no easy way to fix a legacy application. You have to properly encode anything that you put in your html or javascript files, and that does mean revisiting every piece of code that generates html.
See OWASP's XSS prevention cheat sheet for information on how to prevent XSS.
Some comments below suggest that input validation is a better strategy rather than encoding/escaping at the time of output. I'll just quote from
OWASP's XSS prevention cheat sheet -
Traditionally, input validation has been the preferred approach for handling untrusted data. However, input validation is not a great solution for injection attacks. First, input validation is typically done when the data is received, before the destination is known. That means that we don't know which characters might be significant in the target interpreter. Second, and possibly even more importantly, applications must allow potentially harmful characters in. For example, should poor Mr. O'Malley be prevented from registering in the database simply because SQL considers ' a special character?
To elaborate - when the user enters a string like O'Malley, you don't know whether you need that string in javascript, or in html or in some other language. If its in javascript, you have to render it as O\x27Malley
, and if its in HTML, it should look like O'Malley
. Which is why it is recommended that in your database the string should be stored exactly the way the user entered, and then you escape it appropriately according to the final destination of the string.