views:

424

answers:

3

I am using Clojure/Ring/Compojure-0.4/Enlive stack to build a web application.

Are there functions in this stack that would either strip HTML or HTML-encode (i.e. <a> to &lt;a&gt;) user-supplied strings in order to prevent XSS attacks?

+6  A: 

Update: I knew there had to be more than that...

ring.util.codec from ring-core has a functions called which work like so:

user> (require '[ring.util.codec :as c])
nil
user> (c/url-encode "<a>")
"%3Ca%3E"
user> (c/url-decode "<a>")
"<a>"

These are wrappers around java.net.URLEncoder and java.net.URLDecoder. The same namespace provides functions for dealing with Base64 encoding, based on a class from Apache Commons.


Original answer follows.

I'm not sure whether there is a public function to do this, but Enlive has two private functions called xml-str and attr-str which do this:

(defn- xml-str
 "Like clojure.core/str but escapes < > and &."
 [x]
  (-> x str (.replace "&" "&amp;") (.replace "<" "&lt;") (.replace ">" "&gt;")))

(attr-str also escapes ".)

You could get at that function with @#'net.cgrand.enlive-html/xml-str (Clojure doesn't tend to make things really private...) or just copy it to your own namespace.

Michał Marczyk
That's a bit of a bummer. Sounds like a major oversight in most Clojure web frameworks.
Alex B
Apparently the situation isn't quite so bad: see the updated answer. :-)
Michał Marczyk
Looks like I was a little hasty to blame Enlive, but thanks anyway. :)
Alex B
URL encoding really isn't the same as HTML-encoding.url-encode("<a>") => "%3Ca%3E" where as html-encode("<a>") => "<a>"
Siddhartha Reddy
Siddhartha Reddy: Right. I seem to have forgotten the spec between posting the original answer and making the edit -- thanks for pointing it out. At least both options do make user input safe. *(sigh)* Anyway, `xml-str` does use `` entities; too bad it's private. And of course Brian's answer is really the best fit to the question as stated above.
Michał Marczyk
+6  A: 

hiccup.core/escape-html in hiccup does it. That function used to be in Compojure itself (since all of the functionality in hiccup used to be part of Compojure). It's a simple enough function that you could easily write it yourself though.

(defn escape-html
  "Change special characters into HTML character entities."
  [text]
  (.. #^String (as-str text)
    (replace "&" "&amp;")
    (replace "<" "&lt;")
    (replace ">" "&gt;")
    (replace "\"" "&quot;")))

There's also clojure.contrib.string/escape, which takes a map of char -> string escape sequences and a string and escapes it for you.

user> (clojure.contrib.string/escape {\< "&lt;" \> "&gt;"} "<div>foo</div>")
"&lt;div&gt;foo&lt;/div&gt;"

This strikes me as not as useful as it could be, because you might want to escape multi-character sequences and this won't let you. But it might work for your HTML-escaping needs.

And then there are many Java libraries for this, of course. You could use StringEscapeUtils from Apache Commons:

(org.apache.commons.lang.StringEscapeUtils/escapeHtml4 some-string)

This strikes me as a bit heavyweight for this purpose though.

Brian Carper
The correct URL to StringEscapeUtils http://commons.apache.org/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html
grm
+1  A: 

It turns out Enlive does escape HTML by default if you use net.cgrand.enlive-html/content to put text into a HTML element.

(sniptest "<p class=\"c\"></p>" [:.c] (content "<script></script>"))
"<p class=\"c\">&lt;script&gt;&lt;/script&gt;</p>"
Alex B