If I HTML encode any data entered by website users when I redisplay it, will this prevent CSS vulnerabilities?
Also, is there a tool/product available that will sanitize my user input for me, so that I don't have to write my own routines.
If I HTML encode any data entered by website users when I redisplay it, will this prevent CSS vulnerabilities?
Also, is there a tool/product available that will sanitize my user input for me, so that I don't have to write my own routines.
HtmlEncoding input gets you a good portion of the way by not allowing the HTML to render to the page.
Depending on your language items should exist there to sanitize the data. In .NET you can use Server.HtmlEncode(txtInput.Text) to input data from a textbox named txtInput.
As others have mentioned more items are needed to be truly protected.
encoding your HTML is a start... it does not protect from all XSS attacks.
If you use PHP, here is a good function you can use in your sites: Kallahar's RemoveXSS() function
If you don't use PHP, at least the code is well commented, explaining the purpose of each section, and could then be adapted to another programming language.
The answer is no, encoding is not enought. The best protection for XSS is a combination of "whitelist" validation of all incoming data and appropriate encoding of all output data. Validation allows the detection of attacks, and encoding prevents any successful script injection from running in the browser. If you are using .NET you can check this library http://msdn.microsoft.com/en-us/library/aa973813.aspx
You can check also some Cheat sheets to test your protections: http://ha.ckers.org/xss.html
Regards,
Victor
There are various subtleties to this question, although the answer in general is yes.
The safety of your website is highly dependent on where you put the data. If you put it as legit text, there is essentially no way for the attacker to execute XSS. If you put it in an attribute, if you forget to escape quotes or don't check for multibyte well-formedness, you have a possible attack. If you put it in a JSON variable, not escaping properly can lead to arbitrary JavaScript. Etc. etc. Context is very important.
Other users have suggested using XSS removal or XSS detection functions. I tend to think of XSS removal as user unfriendly; if I post an email address like <[email protected]> and your remove XSS function thinks it's an HTML tag, this text mysteriously disappears. If I am running an XSS discussion forum, I don't want people's sample code to be removed. Detection is a little more sensible; if your application can tell when someone is attacking it, it can ban the IP address or user account. You should be careful with this sort of functionality, however; innocents can and will get caught in the crossfire.
Validation is an important part of website logic, but it's also independent of escaping. If I don't validate anything but escape everything, there will be no XSS attacks, but someone can say that their birthday is "the day the music died", and the application wouldn't be the wiser. In theory, strict enough validation for certain data types can perform all the duties of escaping (think numbers, enumerations, etc), but it's general good practice of defense in depth to escape them anyway. Even if you're 100% it's an integer. It might not be.
Escaping plaintext is a trivial problem; if your language doesn't give you a function, a string replace for <
, >
, "
, '
and &
with their corresponding HTML entities will do the trick. (You need other HTML entities only if you're not using UTF-8). Allowing HTML tags is non-trivial, and merits its own Stack Overflow question.