views:

108

answers:

4

I'll be inserting content from remote sources into a web app. The sources should be limited/trusted, but there are still a couple of problems:

The remote sources could

1) be hacked and inject bad things

2) overwrite objects in my global names space

3) I might eventually open it up for users to enter their own remote source. (It would be up to the user to not get in trouble, but I could still reduce the risk.)

So I want to neutralize any/all injected content just to be safe.

Here's my plan so far:

1) find and remove all inline event handlers

str.replace(/(<[^>]+\bon\w+\s*=\s*["']?)/gi,"$1return;"); // untested

Ex.

<a onclick="doSomethingBad()" ...

would become

<a onclick="return;doSomethingBad()" ...

2) remove all occurences of these tags: script, embed, object, form, iframe, or applet

3) find all occurences of the word script within a tag and replace the word script with html entities for it

str.replace(/(<[>+])(script)/gi,toHTMLEntitiesFunc);

would take care

<a href="javascript: ..."

4) lastly any src or href attribute that doesn't start with http, should have the domain name of the remote source prepended to it

My question: Am I missing anything else? Other things that I should definitely do or not do?


Edit: I have a feeling that responses are going to fall into a couple camps.

1) The "Don't do it!" response

Okay, if someone wants to be 100% safe, they need to disconnect the computer.

It's a balance between usability and safety.

There's nothing to stop a user from just going to a site directly and being exposed. If I open it up, it will be a user entering content at their own risk. They could just as easily enter a given URL into their address bar as in my form. So unless there's a particular risk to my server, I'm okay with those risks.

2) The "I'm aware of common exploits and you need to account for this ..." response ... or You can prevent another kind of attack by doing this ... or What about this attack ...?

I'm looking for the second type unless someone can provide specific reasons why my would be more dangerous than what the user can do on their own.

+1  A: 

Instead of sanitizing (black listing). I'd suggest you setup a white list and ONLY allow those very specific things.

The reason for this is you will never, never, never catch all variations of malicious script. There's just too many of them.

colithium
How would I go about that? It would have to cover white listed tags, white listed attributes, and I'd still have to parse attribute values, right?
Keith Bentrup
That could be very complex. Do you know of any working examples that take that approach?
Keith Bentrup
+1  A: 

don't forget to also include <frame> and <frameset> along with <iframe>

cobbal
+1 right, of course, thx.
Keith Bentrup
+1  A: 

for the sanitization thing , are you looking for this?

if not, perhaps you could learn a few tips from this code snippet.

But, it must go without saying that prevention is better than cure. You had better allow only trusted sources, than allow all and then sanitize.

On a related note, you may want to take a look at this article, and its slashdot discussion.

Here Be Wolves
+1  A: 

It sounds like you want to do the following:

  • Insert snippets of static HTML into your web page
  • These snippets are requested via AJAX from a remote site.
  • You want to sanitise the HTML before injecting into the site, as this could lead to security problems like XSS.

If this is the case, then there are no easy ways to strip out 'bad' content in JavaScript. A whitelist solution is the best, but this can get very complex. I would suggest proxying requests for the remote content through your own server and sanitizing the HTML server side. There are various libraries that can do this. I would recommend either AntiSamy or HTMLPurifier.

For a completely browser-based way of doing this, you can use IE8's toStaticHTML method. However no other browser currently implements this.

Paul Stone