views:

104

answers:

6

Hi,

I am thinking of secure ways to serve HTML and JSON to JavaScript. Currently I am just outputting the JSON like:

ajax.php?type=article&id=15
{
 "name":    "something",
 "content": "some content"
}

but I do realize this is a security risk -- because the articles are created by users. So, someone could insert script tags (just an example) for the content and link to his article directly in the AJAX API. Thus, I am now wondering what's the best way to prevent such issues. One way would be to encode all non alphanumerical characters from the input, and then decode in JavaScript (and encode again when put in somewhere).

Another option could be to send some headers that force the browser to never render the response of the AJAX API requests (Content-Type and X-Content-Type-Options).

+4  A: 

Instead of worrying about how you could encode the malicious code when you return it, you should probably take care that it does not even get into your database. A quick google search about preventing cross-site scripting and input validation might help you here. Cheers

moxn
-1 google it? You must be joking this is SO!
Rook
So, you are suggesting Input Encoding. Well, that's one option, but it has to be very strict -- as the server cannot yet tell where the content will end up.
rFactor
@The Rook If you read carefully, you will notice that I suggested to encode the user input before storing it on the server (The OP apparently understood this. btw). The google reference was just a helpful pointer in the right direction.
moxn
@rFactor That's true. There are various things you can do to improve security and usually using only one of them is not enough.
moxn
@moxn I disagree, data in the db shouldn't be encoded because it makes it more difficult to make comparisons. For something like a comment or a blog post its not going to matter. But what about the date/time or an address? XSS is only a vulnerability when it reaches the client and in most cases its best to use htmlentity encoding before printing it out. In this case, its not necessary if you follow the RFC and you use the content-type header properly.
Rook
@The Rook Point taken. But I am still convinced, that if I don't allow the inclusion of Javascript in, let's say, the comment section of a blog post, I will encode the content before storing it in the DB. I don't want to store potentially harmful content on my side...
moxn
@moxn there is a difference between potentially harmful and a vulnerability. Where do you draw the line? Do you also encode percent signs because they might lead to a format string vuln? It matters how the data is used, not where its stored or where it comes from.
Rook
@The Rook I agree completely. The programmer has to draw this line. IMO there is a difference between script-tags and percent signs. All I offered was an answer for his question from my POV and apparently you and I draw lines at different places :)
moxn
@moxn I agree as well. I'm glad we can have a civil discussion on this topic. There isn't a single right answer.
Rook
A: 

I don't think your question is about validating user input, as others pointed out. You don't want to provide your JSON api to other people... right?

If this is the case then there isn't much you can do... in fact, even if you were serving HTML instead of JSON, people would still be doing HTML scraping to get what they wanted from your site (this is how Search Engine spiders work).

A good way to prevent scraping is to allow only a specific amount of downloads from an IP address. This way if someone is requesting http://yoursite.com/somejson.json more than 100 times a day, you probably know it's a scraper, and not someone visiting your page for 100 times in 1 day.

Luca Matteis
Okay, so the attacker will just use a list of proxy servers to scrape your site. Also input validation isn't always the right answer, and is not required to secure this potential vulnerability.
Rook
Define 'list of proxy servers'. There aren't many public proxy servers available and it would sure be harder to setup the proxy servers yourself... but yeah there are indeed ways to get around it.
Luca Matteis
A: 

Insertion of script tags (or SQL) is only a problem if you fail to ensure it isn't at the point that it could be a problem.

A <script> tag in the middle of a comment that somebody submits will not hurt your server and it won't hurt your database. What it would hurt, if you fail to take appropriate measures, would be a page that includes the comment when you subsequently serve it up and it reaches a client browser. In order to prevent that from happening, your code that prepares the page must make sure that user-supplied content is always scrubbed before it is exposed to an unaware interpreter. In this case, that unaware interpreter is a client web browser. In fact, your client web browser really involves two unaware interpreters: the HTML parser & layout engine and the Javascript interpreter.

Another important example of an unaware interpreter is your database server. Note that a <script> tag is (almost certainly) harmless to your database, because "" doesn't mean anything in SQL. It's other sorts of input that cause problems for SQL, like quotes in strings (which are harmless to your HTML pages!).

Stackoverflow would be pretty lame if I couldn't put <script> tags in my answers, as I'm doing now. Same goes for examples of SQL Injection attacks. Recently somebody linked a page from some prominent US bank, where a big <textarea> was footnoted by a warning not to include the characters "<" or ">" in whatever you typed. Predictably, the bank was ridiculed over hundreds of Reddit comments, and rightly so.

Exactly how you "scrub" user-supplied content depends on the unaware interpreter to which you're delivering it. If it's going to be dropped in the middle of HTML markup, then you have to make sure that the "<", ">", and "&" characters are all encoded as HTML entitites. (You might want to do quote characters too, if the content might end up in an HTML element attribute value.) If the content is to be dropped into Javascript, however, you may not need to worry about HTML escaping, but you do need to worry about quotes, and possibly Unicode characters outside the 7-bit range.

Pointy
+1  A: 

If the user has to be logged in to view the web page then secure the ajax.php with the same authorization mechanism. Then a client that's not logged in cannot access ajax.php directly to retrieve the data.

Kwebble
+4  A: 

If you set the Content-Type to application/json then NO Browser will execute JavaScript on that page. This is apart of RFC-4627, and Google uses this to protect them selves. Other Application/ Content types follow similar rules.

You still have to worry about DOM Based XSS, however this would be a problem with your JavaScript, not really the content of the json. Another more exotic security concern with Json is information leakage like this vulnerability in gmail.

Make sure to always test your code. There is the Acunetix free xss scanner, and you could test this manually with a simple <script>alert(/xss/)</script>.

Rook
Perfect answer!
rFactor
@rFactor Thank you! I'm happy to help.
Rook
A: 

For outputting safe html from php, I recommend http://htmlpurifier.org/

DGM
...Or you could kill a fly with a brick.
Rook