views:

867

answers:

5

Are there any security exploits that could occur in this scenario:

eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})

where unsanitized_user_input is a str object. The string is user-generated and could be nasty. Assuming our web framework hasn't failed us, it's a real honest-to-god str instance from the Python builtins.

If this is dangerous, can we do anything to the input to make it safe?

We definitely don't want to execute anything contained in the string.

See also:

The larger context which is (I believe) not essential to the question is that we have thousands of these:

repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      unsanitized_user_input_3,
      unsanitized_user_input_4,
      ...])

in some cases nested:

repr([[unsanitized_user_input_1,
       unsanitized_user_input_2],
      [unsanitized_user_input_3,
       unsanitized_user_input_4],
       ...])

which are themselves converted to strings with repr(), put in persistent storage, and eventually read back into memory with eval.

Eval deserialized the strings from persistent storage much faster than pickle and simplejson. The interpreter is Python 2.5 so json and ast aren't available. Due to the App Engine environment no C modules are allowed and cPickle is not allowed.

+14  A: 

It is indeed dangerous and the safest alternative is ast.literal_eval (see the ast module in the standard library). You can of course build and alter an ast to provide e.g. evaluation of variables and the like before you eval the resulting AST (when it's down to literals).

The possible exploit of eval starts with any object it can get its hands on (say True here) and going via .__class_ to its type object, etc. up to object, then gets its subclasses... basically it can get to ANY object type and wreck havoc. I can be more specific but I'd rather not do it in a public forum (the exploit is well known, but considering how many people still ignore it, revealing it to wannabe script kiddies could make things worse... just avoid eval on unsanitized user input and live happily ever after!-).

Alex Martelli
+3  A: 

Generally, you should never allow anyone to post code.

So called "paid professional programmers" have a hard-enough time writing code that actually works.

Accepting code from the anonymous public -- without benefit of formal QA -- is the worst of all possible scenarios.

Professional programmers -- without good, solid formal QA -- will make a hash of almost any web site. Indeed, I'm reverse engineering some unbelievably bad code from paid professionals.

The idea of allowing a non-professional -- unencumbered by QA -- to post code is truly terrifying.

S.Lott
+8  A: 

If you can prove beyond doubt that unsanitized_user_input is a str instance from the Python built-ins with nothing tampered, then this is always safe. In fact, it'll be safe even without all those extra arguments since eval(repr(astr)) = astr for all such string objects. You put in a string, you get back out a string. All you did was escape and unescape it.

This all leads me to think that eval(repr(x)) isn't what you want--no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but isn't, but that's a different question--unless you're trying to copy a string instance in the slowest way possible :D.

Hao Lian
That's exactly right; I definitely don't want anything in the string to be executed. The reason for doing this would make much more sense if I presented the larger context but I tried to simplify the scenario for the question.
Brandon Thomson
+2  A: 

With everything as you describe, it is technically safe to eval repred strings, however, I'd avoid doing it anyway as it's asking for trouble:

  • There could be some weird corner-case where your assumption that only repred strings are stored (eg. a bug / different pathway into the storage that doesn't repr instantly becmes a code injection exploit where it might otherwise be unexploitable)

  • Even if everything is OK now, assumptions might change at some point, and unsanitised data may get stored in that field by someone unaware of the eval code.

  • Your code may get reused (or worse, copy+pasted) into a situation you didn't consider.

As Alex Martelli pointed out, in python2.6 and higher, there is ast.literal_eval which will safely handle both strings and other simple datatypes like tuples. This is probably the safest and most complete solution.

Another possibility however is to use the string-escape codec. This is much faster than eval (about 10 times according to timeit), available in earlier versions than literal_eval, and should do what you want:

>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True

(The [1:-1] is to strip the outer quotes repr adds.)

Brian
+1  A: 
repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      ...

... unsanitized_user_input is a str object

You shouldn't have to serialise strings to store them in a database..

If these are all strings, as you mentioned - why can't you just store the strings in a db.StringListProperty?

The nested entries might be a bit more complicated, but why is this the case? When you have to resort to eval to get data from the database, you're probably doing something wrong..

Couldn't you store each unsanitized_user_input_x as it's own db.StringProperty row, and have group them by an reference field?

Either of those may not be applicable, since I've no idea what you're trying to achieve, but my point is - can you not structure the data in a way you where don't have to rely on eval (and also rely on it not being a security issue)?

dbr
One reason is that the strings need to be compressed to fit in App Engine's 1MB entity limit, and I thought the overhead of compressing 1000s of strings individually would probably be much higher than serializing them, compressing them all together, and putting them into a blob. The space savings would probably be less too.But it's a good point... I'm definitely looking for ways to avoid eval without increasing running cost too much.
Brandon Thomson