views:

101

answers:

5

I plan to use those functions in web-environment, so my concern is if those functions can be exploited and used for executing malicious software on the server.

Edit: I don't execute the result. I parse the AST tree and/or catch SyntaxError.

This is the code in question:

try:
    #compile the code and check for syntax errors
    compile(code_string, filename, "exec")
except SyntaxError, value:
    msg = value.args[0]

    (lineno, offset, text) = value.lineno, value.offset, value.text

    if text is None:
        return [{"line": 0, "offset": 0, 
            "message": u"Problem decoding source"}]

    else:
        line = text.splitlines()[-1]

        if offset is not None:
            offset = offset - (len(text) - len(line))
        else:
            offset = 0

        return [{"line": lineno, "offset": offset, "message": msg}]

else:
    #no syntax errors, check it with pyflakes
    tree = compiler.parse(code_string)
    w = checker.Checker(tree, filename)
    w.messages.sort(lambda a, b: cmp(a.lineno, b.lineno))

checker.Checker is pyflakes class that parses the AST tree.

+2  A: 

compiler.parse and compile could most definitely be used for an attack if the attacker can control their input and the output is executed. In most cases, you are going to either eval or exec their output to make it run so those are still the usual suspects and compile and compiler.parse (deprecated BTW) are just adding another step between the malicious input and the execution.

EDIT: Just saw that you left a comment indicating that you are actually planning on using these on USER INPUT. Don't do that. Or at least, don't actually execute the result. That's a huge security hole for whoever ends up running that code. And if nobody's going to run it, why compile it? Since you clarified that you only want to check syntax, this should be fine. I would not store the output though as there's no reason to make anything easier for a potential attacker and being able to get arbitrary code onto your system is a first step.

If you do need to store it, I would probably favor a scheme similar to that commonly used for images where they are renamed in a non-predictable manner with the added step of making sure that it is not stored on the import path.

aaronasterling
"compiler.parse and compile could most definitely be used for an attack"? How so? If no `eval` or `exec` is done, what's the attack?
S.Lott
I am parsing the AST tree node by node, and try to catch SyntaxError. Other than that, nothing.
dekomote
I think even ast.literal_eval(node_or_string) is safe.
Paulo Scardine
@S.Lott, read very next sentence and rest of post. Perhaps my language wasn't precise enough but I got the point across.
aaronasterling
@Paulo Scardine OP gave no mention of `ast.literal_eval` and it can only evaluate e.g. tuple, list, dict, literals. So it's safe because it's very limited.
aaronasterling
@dekomote. If you are not executing the resulting code, then I would think that it's safe to use the `ast` module.
aaronasterling
@aaronsterling: "read very next sentence and rest of post". Why? The sentence "compiler.parse and compile could most definitely be used for an attack" doesn't seem to make sense. What is the exploit if `eval` and `exec` are **never** used? Could you please clarify your answer?
S.Lott
@S. Lott. Given the new information from the OP, I updated.
aaronasterling
@aaronsterling: "Realistically, you are going to either eval or exec their output"? Really? How do you know this?
S.Lott
@S. Lott, because the whole point of compiling code is to run it in most of the cases. That's why compilers were invented (I'm sure I don't need to give you a history lesson). There is little reason to be interested if code _will_ compile if one does not intend to _compile_ it and there is little reason to compile it if one does not intend, someone, somewhere down to road to _run_ it. This is pretty standard. The fact that OP has a use for merely checking the syntax without wanting to actually run it eventually is rare and out of the ordinary.
aaronasterling
@aaronsterling: the comment was meant to dekomote, if he is checking syntax just to validate data, he can even evaluate it with ast.literal_eval.
Paulo Scardine
@aaronsterling: "This is pretty standard". But -- it appears -- unsupported by any aspect of the question. How do you **know** running the code is part of question? And if you don't know, why is that assumption so important?
S.Lott
@S. Lott I have to grant you that. To be honest, I'm at as much of a loss to explain why my answer was selected as you no doubt are.
aaronasterling
+4  A: 

I think the more interesting question is what are you doing with the compiled functions? Running them is definitely unsafe.

I've tested the few exploits i could think of seeing as its just a syntax checker (can't redefine classes/functions etc) i don't think there is anyway to get python to execute arbitrary code at compile time

tobyodavies
"I think the more interesting question..." is more of a comment than an answer, isn't it?
S.Lott
no, I tested the exploits I could think of, and came to the conclusion that python != Lisp and thus you can't execute code or redefine classes at compile time
tobyodavies
+1  A: 

Yes, they can be maliciously exploited.

If you really want safe sandboxing, you could look at PyPy's sandboxing features, but be aware that sandboxing is not easy, and there may be better ways to accomplish whatever you are seeking.

Correction

Since you've updated your question to clarify that you're only parsing the untrusted input to AST, there is no need to sandbox anything: sandboxing is specifically about executing untrusted code (which most people probably assumed your goal was, by asking about sandboxing).

Using compile / compiler only for parsing this way should be safe: Python source parsing does not have any hooks into code execution. (Note that this is not necessarily true of all languages: for example, Perl cannot be (completely) parsed without code execution.)

The only other remaining risk is that someone may be able to craft some pathological Python source code that makes one of the parsers use runaway amounts of memory / processor time, but resource exhaustion attacks affect everything, so you'll just want to manage this as it becomes necessary. (For example, if your deployment is mission-critical and cannot afford a denial of service by an attacker armed with pathological source code, you can execute the parsing in a resource-limited subprocess).

Piet Delport
How? How can they be exploited? What's the attack against `compile`?
S.Lott
S.Lott: clarified, based on your updated question.
Piet Delport
...based on *dekomote*'s updated question, even.
Piet Delport
+2  A: 

If the resulting code or AST object is never evaluated, I think you are only subject to DDoS attacks.

If you are evaluating user inputed code, it is the same as giving shell access as the webserver user to every user.

Paulo Scardine
+1  A: 

They are not, but it's not too hard finding a subset of Python that can be sandboxed to a point. If you want to go down that road you need to parse that subset of Python yourself and intercept all calls, attribute lookups and everything else involved. You also don't want to give users access to any language construct such as unterminating loops and more.

Still interested? Head over to jinja2.sandbox :)

Armin Ronacher