views:

126

answers:

3

I've been working on a forum-like system, which does not allow for HTML formatting. The method I currently use is to escape HTML entities before they get inserted into the database. I've been told (in relation to XSS vulnerabilities) that I should insert the raw comment into the database, and escape HTML entities upon output.

Other questions here I've seen on the matter seem to imply that the HTML would/could still be used for formatting, thus I'm asking for a case where the HTML would not be used at all.

+9  A: 

Yes, because at some stage you'll want access to the original input entered. This is because...

  • You never know how you want to display it - in JSON, in HTML, as an SMS?
  • You may need to show it back to the user as is.

I do see your point about never wanting HTML entered. What are you using to strip HTML tags? If it a regex, then look out for confused users who might type something like this...

3 < 4 yes, :->

They'll only get the 3 and space if it is a regex.

alex
I'm using `htmlentities()` in PHP, which (for example) turns `<` into `<`
a2h
+1! I agree. Add to that the case where you change how you're doing your escaping, or you decide later that you want to allow certain tags like `<b>`, `<i>`, `<u>` and `<a>`. Escaping the data on the way out is future-proof.
mattmc3
@a2h I tagged your question accordingly. I would use `htmlentities()` to display in your HTML, if that is how you wanted it displayed.
alex
knittl
@knittl Yeah, I'd use `htmlspecialchars()` too. I added the *if that is how you wanted it displayed* because he may want to have everything encoded to its entity.
alex
Use of either or even both of these functions will not provide a magic bullet to protect you from XSS under all circumstances. See The excellent reference in Kittls answer that demonstrates the 6 different injection enviroments and the rules you need in each of those 6 different circumstances.
Cheekysoft
+5  A: 

you will also restrict yourself when performing the escaping before inserting into your db. let's say you decide to not use HTML as output, but JSON, plaintext, etc.

if you have stored escaped html in your db, you would first have to 'unescape' the value stored in the db, just to re-escape it again into a different format.

also see this perfect owasp article on xss prevention

knittl
A: 

I usually store both versions of the text. The escaped/formatted text is used when a normal page request is made to avoid the overhead of escaping/formatting every time. The original/raw text is used when a user needs to edit an existing entry, and the escaping/formatting only occurs when the text is created or changed. This strategy works great unless you have tight storage space constraints, since you will be duplicating data.

limscoder