views:

34

answers:

2

Hi,

I have two applications where users can submit HTML pages. I would like to make sure that no scripts are included in the HTML. Normally you would escape content to get rid of scripts, but as this is HTML I can't do that. Anyone with good suggestions on how to do that? The applications are written in both C# and Java

+1  A: 

The first thing I'd do is see if there is a <script> tag in the HTML. That solves the first issue, then you have to make sure there are no inline onmouseover/onclick etc. events. You could maybe use a DOM Parser to go over all elements and remove all attributes that start with 'on'.

I have little to no experience in both C# as Java, so am unaware of any "easier" solutions that area already available. But maybe someone else here has a better idea for that.

CharlesLeaf
Usually it's much safer to decide on what you allow, rather than trying to remove the things you disallow.
sje397
@sje397 I got the impression that he doesn't want to allow any javascript. And you'd still need to "remove" the things you disallow, right? If you do want to allow some javascript there are other security risks (like getting the document.cookie and sending it to a remote location.. session hijacking, location redirection etc.)
CharlesLeaf
sje397
Ah then I misunderstood your first comment. I agree with this, that is the best way to go.
CharlesLeaf
Yes, this is all true. The problem though is that a javascript may come in many different flavors. There are numerous ways of inserting javascripts without using the <script> tag. This is a lot of good examples: http://ha.ckers.org/xss.html
Tomas
And we also talked about those other ways ;) Another one would be CSS expressions for IE. There are many ways yes.
CharlesLeaf
+1  A: 

OWASP has a project to scrub html and css

epascarello
This looks interesting, will have a look at it.
Tomas