views:

241

answers:

2

Suppose that I take a user-supplied string, userstring, and call (keyword userstring) on it.

Are there any security concerns about doing this? And if so, what would be the best way to mitigate them?

+4  A: 

Off the top of my head:

(keyword s) will create a non-namespaced keyword with name s regardless of whether such a keyword could be represented by a keyword literal. That could be a security concern if you were to print those keywords out as part of some configuration file, say, and then attempt to use it as trusted code:

(with-out-str (println (keyword "foo (println :bar)")))
; => :foo (println :bar)

Also, here are two threads of interest from the Google groups (the first one is from clojure-dev):

  1. Request for Improvement (with patch): non-interning keyword lookup

  2. Are keywords and symbols garbage-collected?

A summary: interning garbage keywords could be a memory leak, so you should consider doing some preprocessing on strings which you might intern if they come from untrusted sources.

Michał Marczyk
Very helpful, thank you.
Rob Lachlan
+3  A: 

Per http://clojure.org/reader, there are rules for which characters are valid in symbols and keywords. (For now, alphanumeric characters and *, +, !, -, _, and ?.) You should never create a symbol containing any other characters. However, right now, these rules are completely unenforced by the compiler.

At best you could end up with invalid keywords. At worst you could end up with evil/dangerous ones, as Michał Marczyk said. Keep in mind that #=() can be used to run arbitrary code at read-time, so you don't even have to evaluate a string for bad things to happen, you only have to read it.

(keyword "foo #=(steal-passwords-and-delete-hard-drive)")

(See (doc *read-eval*) for how to disable this behavior, but read-eval is enabled by default.)

I think general rules for sanitizing user input apply here. Define precisely what you want to allow, and disallow everything else by default. Maybe allow something like the regex #"[a-zA-Z0-9*+!-_?]+", with possibly other alphanumerics depending on the language you speak.

Brian Carper
"#=() can be used to run arbitrary code at read-time, so you don't even have to evaluate a string for bad things to happen, you only have to read it." That I didn't know, thank you.
Rob Lachlan