ansaurus

Question

Safety of Python 'eval' For List Deserialization

Answer 1

+14 A:

It is indeed dangerous and the safest alternative is ast.literal_eval (see the ast module in the standard library). You can of course build and alter an ast to provide e.g. evaluation of variables and the like before you eval the resulting AST (when it's down to literals).

The possible exploit of eval starts with any object it can get its hands on (say True here) and going via .__class_ to its type object, etc. up to object, then gets its subclasses... basically it can get to ANY object type and wreck havoc. I can be more specific but I'd rather not do it in a public forum (the exploit is well known, but considering how many people still ignore it, revealing it to wannabe script kiddies could make things worse... just avoid eval on unsanitized user input and live happily ever after!-).

Alex Martelli 2009-07-11 01:33:02

Answer 2

+3 A:

Generally, you should never allow anyone to post code.

So called "paid professional programmers" have a hard-enough time writing code that actually works.

Accepting code from the anonymous public -- without benefit of formal QA -- is the worst of all possible scenarios.

Professional programmers -- without good, solid formal QA -- will make a hash of almost any web site. Indeed, I'm reverse engineering some unbelievably bad code from paid professionals.

The idea of allowing a non-professional -- unencumbered by QA -- to post code is truly terrifying.

S.Lott 2009-07-11 01:38:39

Answer 3

+8 A:

If you can prove beyond doubt that unsanitized_user_input is a str instance from the Python built-ins with nothing tampered, then this is always safe. In fact, it'll be safe even without all those extra arguments since eval(repr(astr)) = astr for all such string objects. You put in a string, you get back out a string. All you did was escape and unescape it.

This all leads me to think that eval(repr(x)) isn't what you want--no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but isn't, but that's a different question--unless you're trying to copy a string instance in the slowest way possible :D.

Hao Lian 2009-07-11 01:42:00

That's exactly right; I definitely don't want anything in the string to be executed. The reason for doing this would make much more sense if I presented the larger context but I tried to simplify the scenario for the question.

Brandon Thomson 2009-07-11 01:58:32

Answer 4

+2 A:

With everything as you describe, it is technically safe to eval repred strings, however, I'd avoid doing it anyway as it's asking for trouble:

There could be some weird corner-case where your assumption that only repred strings are stored (eg. a bug / different pathway into the storage that doesn't repr instantly becmes a code injection exploit where it might otherwise be unexploitable)
Even if everything is OK now, assumptions might change at some point, and unsanitised data may get stored in that field by someone unaware of the eval code.
Your code may get reused (or worse, copy+pasted) into a situation you didn't consider.

As Alex Martelli pointed out, in python2.6 and higher, there is ast.literal_eval which will safely handle both strings and other simple datatypes like tuples. This is probably the safest and most complete solution.

Another possibility however is to use the string-escape codec. This is much faster than eval (about 10 times according to timeit), available in earlier versions than literal_eval, and should do what you want:

>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True

(The [1:-1] is to strip the outer quotes repr adds.)

Brian 2009-07-11 10:57:19

Answer 5

+1 A:

repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      ...
... unsanitized_user_input is a str object

You shouldn't have to serialise strings to store them in a database..

If these are all strings, as you mentioned - why can't you just store the strings in a db.StringListProperty?

The nested entries might be a bit more complicated, but why is this the case? When you have to resort to eval to get data from the database, you're probably doing something wrong..

Couldn't you store each unsanitized_user_input_x as it's own db.StringProperty row, and have group them by an reference field?

Either of those may not be applicable, since I've no idea what you're trying to achieve, but my point is - can you not structure the data in a way you where don't have to rely on eval (and also rely on it not being a security issue)?

dbr 2009-07-11 16:03:55

One reason is that the strings need to be compressed to fit in App Engine's 1MB entity limit, and I thought the overhead of compressing 1000s of strings individually would probably be much higher than serializing them, compressing them all together, and putting them into a blob. The space savings would probably be less too.But it's a good point... I'm definitely looking for ways to avoid eval without increasing running cost too much.

Brandon Thomson 2009-07-11 17:00:38

ansaurus

tags:

views:

answers:

Safety of Python 'eval' For List Deserialization

related questions