tags:

views:

99

answers:

3

Hi, I'm writing a servlet-based application in which I need to provide a messaging system. I'm in a rush, so I choose CKEditor to provide editing capabilities, and I currently insert the generated html directly in the weg page displaying all messages (messages are stored in a MySQL databse, fyi). CKEditor already filters HTML based on a white list, but a user can still inject malicious code with a POST request, so this is not enough.

A good library already exists to prevent XSS attacks by filtering HTML tags, but it's written in PHP: HTML Purifier

So, is there a similar mature library that can be used in Java ? A simple string replacement based on a white list doesn't seem to be enough, since I'd like to filter malformed tags too (which could alter the design of the page on which the message is displayed).

If there isn't, then can you give my directionds how I proceed ? An XML parser seems overkill.

Thank you !

Note: There are a lot of questions about this on SO, but all the answers refer to filter ALL HTML tags: I want to keep valid formatting tags.

+2  A: 

You should use AntiSamy. (That's what I did)

Thierry-Dimitri Roy
I missed this question, thank you. Certainly a duplicate.
Samuel_xL
+4  A: 

I'd recommend using Jsoup for this. Here's an extract of relevance from its site.

Sanitize untrusted HTML

Problem

You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.

Solution

Use the jsoup HTML Cleaner with a configuration specified by a Whitelist.

String unsafe = 
      "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>";
String safe = Jsoup.clean(unsafe, Whitelist.basic());
      // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>

Jsoup offers more advantages than that as well. See also Pros and Cons of HTML parsers in Java.

BalusC
+1  A: 

If none of the ready-made options seem like enough, there is an excellent series of articles on XSS and attack prevention at Google Code. It should provide plenty of information to work with, if you end up going down that path.

wilsona