How do I make sure I don't escape something twice?
I've heard that its good practice to escape values as you receive them from a form, and also escape when you output. That way you have two chances to catch something.
How do I make sure I don't escape something twice?
I've heard that its good practice to escape values as you receive them from a form, and also escape when you output. That way you have two chances to catch something.
You should only html encode when you output something to a browser. This prevents XSS attacks. The kind of escaping that you do when you collect data from a form, before you insert it into a database is not html encoding. It's escaping special database characters (best done using parameterized queries). The purpose of that is to prevent SQL injection attacks. So there is no double encoding going on.
I presume that you're using JSP.
Just escape during display only. There for the JSTL <c:out>
tag is perfectly suitable. It escapes HTML entities by default. Use it to display every user-controlled input, such as request URL, request headers and request parameters.
E.g.
<input type="text" name="foo" value="<c:out value="${param.foo}" />">
Escaping during input is not needed. XSS doesn't harm in raw Java code nor in SQL databases. On the other hand, you would also rather save data unmodified in DB so that you can still see what the user actually entered, so that you can if necessary do social actions on mailicious users.
If you'd like to know what to escape during input, it would be SQL injection. In such case just use PreparedStatement
instead of regular Statement
whenever you want to save any user-controlled input in the database.
E.g.
create = connection.prepareStatement("INSERT INTO user (username, password) VALUES (?, MD5(?))");
create.setString(1, username);
create.setString(2, password);
create.executeUpdate();
Content that is harmless in one context can be dangerous in another context. The best way to avoid injection attacks is to prepare the content before passing it to another context. In your case html text changes its context when it is passed to the browser. The server doesn't render the html but the browser does. So be sure to pass no malicious html to the browser and mask it before sending.
Another argument to do so is that it could be possible that the attack code is assembled within the application from two ore more inputs. Each of the inputs was harmless but together they can become dangerous.