views:

1609

answers:

4

I need to display external resources loaded via cross domain requests and make sure to only display "save" content.

Could use Prototype's String#stripScripts to remove script blocks. But handlers such as onclick or onerror are still there.

Is there any library which can at least

  • strip script blocks,
  • kill DOM handlers,
  • remove black listed tags (eg: embed or object).

So are any JavaScript related links and examples out there?

A: 

i'm not sure if this is the way (dunno php so well), but you can take a look here. You have JS equivalent for htmlentities, htmlspecialcharacters, and so on.

Ionut Staicu
+2  A: 

You can't anticipate every possible weird type of malformed markup that some browser somewhere might trip over to escape blacklisting, so don't blacklist. There are many more structures you might need to remove than just script/embed/object and handlers.

Instead attempt to parse the HTML into elements and attributes in a hierarchy, then run all element and attribute names against an as-minimal-as-possible whitelist. Also check any URL attributes you let through against a whitelist (remember there are more dangerous protocols than just javascript:).

If the input is well-formed XHTML the first part of the above is much easier.

As always with HTML sanitisation, if you can find any other way to avoid doing it, do that instead. There are many, many potential holes. If the major webmail services are still finding exploits after this many years, what makes you think you can do better?

bobince
+5  A: 

Shameless plug: see http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.

It is white-listed, not black-listed, but the whitelists are configurable as per http://code.google.com/p/google-caja/wiki/CajaWhitelists

Mike Samuel
+1  A: 

Never trust the client. If you're writing a server application, assume that the client will always submit unsanitary, malicious data. It's a rule of thumb that will keep you out of trouble. If you can, I would advise doing all validation and sanitation in server code, which you know (to a reasonable degree) won't be fiddled with. Perhaps you could use a serverside web application as a proxy for your clientside code, which fetches from the 3rd party and does sanitation before sending it to the client itself?

[edit] I'm sorry, I misunderstood the question. However, I stand by my advice. Your users will probably be safer if you sanitize on the server before sending it to them.

Sean Edwards