views:

320

answers:

2

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.

I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").

Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.


How about that?

A: 

One of the most basic things to do to prevent XSS attacks is to HTML encode special characters like <, >, &, " and ' to &lt; &gt; &amp; &quot; and &apos;. You can do this very easily using plain old string replace. This is a good start.

For more information you can check out - http://htmlencode.net/

Rahul
encoding html tags and special chars is not good solution here. it would convert html to plaintext-alike format. html comes in, html (sanitized = safe) must come out.
WildWezyr
+3  A: 
Vineet Reynolds
this would require to rebuild architecture of my whole project. i'm not willing to do it. i need something simple without many dependencies and no need to change the way my code is organized (i like it the way it is now). so - i need just a util class to do the work. my question is now updated to clarify that requirement.
WildWezyr
I'm not sure what you mean by rebuilding the architecture of the project. AntiSamy fits in perfectly into your requirement by allowing text editor inputs to be fed into a filtering library driven by a site policy.
Vineet Reynolds
Hmmm. Seems you are right! I just thought it is big and heavy framework like struts, spring etc. and works as some kind of servlet filter ;-). Probably big letters in name ("OWASP") misled me here. BTW: what are exact dependencies of OWASP AntiSamy - what else will I need to use it?
WildWezyr
The AntiSamy POM might give you a hint (the link provided later is from SVN, and should not be used directly). It does need a couple of other libraries, but I'm not sure how they're internally used by AntiSamy. Ref: http://code.google.com/p/owaspantisamy/source/browse/trunk/Java/current/antisamy-project/antisamy/pom.xml?r=149
Vineet Reynolds