views:

35

answers:

6

Overflowed Stack,

I have a Java web application (tomcat) whereby I allow the user to upload HTML code through a form.

Now since I am running on tomcat and I actually display the user-uploaded HTML I do not want a user to malicious code JSP tags/scriptlet/EL and for these to be executed on the server. I want to filter out any JSP/non-HTML content.

Writing a parser myself seems too onerous - apart from the lots of subtleties one has to take care of (comments, byte representation for the scripts etc).

Do you know of any API/library which does this for me ? I know about Caja filtering, but am looking at something specifically for JSPs.

Many Thanks, JP, Malta.

A: 

I'm not sure if i have understand you question completly but if you whant to remove all content in suround with a "<%@ .. %>" you can replace it with regex.

String resultString = subjectString.replaceAll("(?sim)<%@ .*? %>", "");
Floyd
That is too onerous to maintain - what about <jsp:include> and a thousand other tags ? Note that the namespace might be renamed e.g. <groovy:include>.This is why I am looking for a library to do that.
+2  A: 

Don't worry about executing JSP code. Your JSP will be turned into a servlet once, so you will have something like:

out.println(contents);

and the contents won't be evaluated as JSP code. But you must worry about malicious javascript

Bozho
The uploaded HTML gets saved in the file system and is served directly to the user... So, yes I must worry about JSP tags which may be injected by the user and run on the server.
if it gets saved as a `.jsp`, it is the wrong way to do it. Save it as .txt and load it as string.
Bozho
Exactly. There seems to be a major misconception here.
BalusC
I <b>have</b> to save it as a JSP as we add scriptlets to it to do particular fancy things.
Instead of adding scriptlets to files, can't you use a template with those scriptlets?
Bozho
A: 

I don't have a library to remove JSP tags, but you can write a little one based on regexp that would :

  • delete all "<% %>" tags
  • delete all HTML tags that contains the ':' character (to avoid "" tags for example

I don't know whether all potential malicious java code is included with theses two filters but it is a good start...

Another solution, but a little more complicated : use a http proxy server (Apache httpd, Nginx, etc.), that will serve directly static resources (css, images, html pages) and forward to Tomcat only dynamic resources (JSP and .do actions for example). When a file is uploaded, you force the file extension to ".html". You are sure (thanks to the http proxy) that the file will not be interpreted by Tomcat.

Benoit Courtine
A: 

If the pages supplied by the users aren't mentioned in the web.xml and you don't have a rule "anything that ends with *.jsp is a JSP" in web.xml, Tomcat won't try to compile/run them.

What is much more important: You must filter the HTML or users could add arbitrary JavaScript which would then steal other users passwords. This is non-trivial. Try to clean the code with JTidy to get XML and then remove all <script> tags, <link>, <object>, maybe even <img> (unless you make sure the URLs supplied are valid; some buggy browsers might run JavaScript if the image source is actually text/JavaScript, all CSS styles and make sure any href points to a safe URL. Don't forget <iframe> and <applet> and all the other things that might break your secure shell.

[EDIT] Thats should give you an idea where this is going to. In the end, you should do the reverse: Allow only a very small subset of HTML -- if at all. Most sites (like this one) use special markup for the formatting for two reasons:

  1. It's more simple for the user
  2. It's more secure
Aaron Digulla
This is not exact. If you have a jsp page you do not need to list it in web.xml for it to run. The user uploaded content gets saved to a jsp page, after I do some processing on it.
So you **intentionally** create a new JSP? That sounds *very* dangerous to me. If someone proposed that to me, I'd say "No" or "It will take me half a year to get right." Don't do it if you can avoid it.
Aaron Digulla
I know, crazy as it may sound - we intentionally create a new JSP file...
Well, since the people who decided this obviously don't care about security in any way, why bother with filtering?
Aaron Digulla
+1  A: 

Using a library for content cleaning is better than trying to do it yourself with e.g. Regexes.

Try Antisamy of the Open Web Application Security Project.

http://www.owasp.org/index.php/Antisamy

I didnt used it (yet), but seems to be suitable. JSP Content should be automatically removed/escaped by the HTML Normalization.

Edit, just found these:
http://stackoverflow.com/questions/2774074/best-practice-user-generated-html-cleaning
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Markus Kull
Interesting post. Would be even more interesting to compare to Google's Caja - which seems to be the defacto standard in this area.
Didnt know about Caja before, this is interesting. Seems to be especially suited for embedding 3rdparty widgets.
Markus Kull
+2  A: 

Just save it as *.html, not as *.jsp, then it won't be passed through the JspServlet which does all the taglib/EL processing work. All taglibs/EL will end up plain (unparsed) in response.

BalusC
I was just about to edit my answer adding that :) +1
Bozho
Thanks Balus, but we cannot do that as we add JSP scriplets to the user uploaded content ourselves (so we need to render those).