I have go through the OWASP top ten vulnerabilities and found that Cross-Site Scripting is the one we have to take notes. There was few way recommended solutions. One has stated that Do not use "blacklist" validation to detect XSS in input or to encode output. Searching for and replacing just a few characters ("<" ">" and other similar characters or phrases such as “script”) is weak and has been attacked successfully. Even an unchecked “” tag is unsafe in some contexts. XSS has a surprising number of variants that make it easy to bypass blacklist validation. Another solution said that Strong output encoding. Ensure that all user-supplied data is appropriately entity encoded (either HTML or XML depending on the output mechanism) before rendering. So, which is the best way to prevent cross side scripting to validate and replace the input or encoding the output ?
My preference is to encode all non-alphaumeric characters as HTML numeric character entities. Since almost, if not all attacks require non-alphuneric characters (like <, ", etc) this should eliminate a large chunk of dangerous output.
Format is &#N;, where N is the numeric value of the character (you can just cast the character to an int and concatenate with a string to get a decimal value). For example:
// java-ish pseudocode StringBuffer safestrbuf = new StringBuffer(string.length()*4); foreach(char c : string.split() ){ if( Character.isAlphaNumeric(c) ) safestrbuf.append(c); else safestrbuf.append(""+(int)symbol);
You will also need to be sure that you are encoding immediately before outputting to the browser, to avoid double-encoding, or encoding for HTML but sending to a different location.
Use both. In fact refer a guide like the OWASP XSS Prevention cheat sheet, on the possible cases for usage of output encoding and input validation.
Input validation helps when you cannot rely on output encoding in certain cases. For instance, you're better off validating inputs appearing in URLs rather than encoding the URLs themselves (Apache will not serve a URL that is url-encoded). Or for that matter, validate inputs that appear in JavaScript expressions.
Ultimately, a simple thumb rule will help - if you do not trust user input enough or if you suspect that certain sources can result in XSS attacks despite output encoding, validate it against a whitelist.
Do take a look at the OWASP ESAPI source code on how the output encoders and input validators are written in a security library.
The normal practice is to HTML-escape any user-controlled data during redisplaying in JSP, not during processing the submitted data in servlet nor during storing in DB. In JSP you can use the JSTL (to install it, just drop jstl-1.2.jar in /WEB-INF/lib
) <c:out>
tag or fn:escapeXml
function for this. E.g.
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
...
<p>Welcome <c:out value="${user.name}" /></p>
and
<%@ taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
...
<input name="username" value="${fn:escapeXml(param.username)}">
That's it. No need for a blacklist. Note that user-controlled data covers everything which comes in by a HTTP request: the request parameters, body and headers(!!).
If you HTML-escape it during processing the submitted data and/or storing in DB as well, then it's all spread over the business code and/or in the database. That's only maintenance trouble and you will risk double-escapes or more when you do it at different places (e.g. &
would become &amp;
instead of &
so that the enduser would literally see &
instead of &
in view. The business code and DB are in turn not sensitive for XSS. Only the view is. You should then escape it only right there in view.