views:

108

answers:

2

I would like to enable students to submit python code solutions to a few simple python problems. My applicatoin will be running in GAE. How can I limit the risk from malicios code that is sumitted? I realize that this is a hard problem and I have read related Stackoverflow and other posts on the subject. I am curious if the restrictions aleady in place in the GAE environment make it simpler to limit damage that untrusted code could inflict. Is it possible to simply scan the submitted code for a few restricted keywords (exec, import, etc.) and then ensure the code only runs for less than a fixed amount of time, or is it still difficult to sandbox untrusted code even in the resticted GAE environment? For example:

# Import and execute untrusted code in GAE
untrustedCode = """#Untrusted code from students."""

class TestSpace(object):pass
  testspace = TestSpace()

try:
  #Check the untrusted code somehow and throw and exception.
except:
   print "Code attempted to import or access network"


try:
    # exec code in a new namespace (Thanks Alex Martelli)
    # limit runtime somehow
    exec untrustedCode in vars(testspace)
except:
    print "Code took more than x seconds to run"
+5  A: 

@mjv's smiley comment is actually spot-on: make sure the submitter IS identified and associated with the code in question (which presumably is going to be sent to a task queue), and log any diagnostics caused by an individual's submissions.

Beyond that, you can indeed prepare a test-space that's more restrictive (thanks for the acknowledgment;-) including a special 'builtin' that has all you want the students to be able to use and redefines __import__ &c. That, plus a token pass to forbid exec, eval, import, __subclasses__, __bases__, __mro__, ..., gets you closer. A totally secure sandbox in a GAE environment however is a real challenge, unless you can whitelist a tiny subset of the language that the students are allowed.

So I would suggest a layered approach: the sandbox GAE app in which the students upload and execute their code has essentially no persistent layer to worry about; rather, it "persists" by sending urlfetch requests to ANOTHER app, which never runs any untrusted code and is able to vet each request very critically. Default-denial with whitelisting is still the holy grail, but with such an extra layer for security you may be able to afford a default-acceptance with blacklisting...

Alex Martelli
Based on this view, do you think that it is simpler (and safer) to just download the python test and solution pair to a Linux system and run them in temporary Linux user account? All I really need to get back is the text output from the execution, and I could delete the temporary Linux account right after execution to purge any evil spirits.
Chris
@Chris, definitely -- if you can arrange for the untrusted code to be run on an account (esp. one with restrictive quotas so it can't even try a denial-of-service attack - neither on the machine it's running on, nor any other) you're MUCH safer. sysadms have been "since forever" giving out accounts to untrusted users (inc. students) with much more degrees of freedoms than your students will get from this arrangement, yet there very rarely is a successful escalation-of-privileges ("rootkit";-) attack.
Alex Martelli
+2  A: 

You really can't sandbox Python code inside App Engine with any degree of certainty. Alex's idea of logging who's running what is a good one, but if the user manages to break out of the sandbox, they can erase the event logs. The only place this information would be safe is in the per-request logging, since users can't erase that.

For a good example of what a rathole trying to sandbox Python turns into, see this post. For Guido's take on securing Python, see this post.

There are another couple of options: If you're free to choose the language, you could run Rhino (a Javascript interpreter) on the Java runtime; Rhino is nicely sandboxed. You may also be able to use Jython; I don't know if it's practical to sandbox it, but it seems likely.

Alex's suggestion of using a separate app is also a good one. This is pretty much the approach that shell.appspot.com takes: It can't prevent you from doing malicious things, but the app itself stores nothing of value, so there's no harm if you do.

Nick Johnson
Thanks Nick. If shell.appspot.com is already letting anonymous users execute untrusted code on appspot, hasn't this example app already tackled the untrusted code problem? As long as I limit the amount off time that code can run and limit access to URL fetch, what would I be at risk of students doing in my copy of shell? Put another way, why can't I anonymously abuse shell.appspot.com in whatever way my students might attempt to abuse my app?
Chris
Well, it 'tackles' the issue by more or less ignoring it: The shell app itself is sandboxed, and while it can't prevent you doing anything you want with/to it, you can't escape the sandbox and do harm to other apps, or modify the shell app for other users.You can't directly limit the amount of time your students can use - but the runtime system will terminate the request after 30 seconds anyway. You also can't prevent them from using URLFetch, though you could make it rather difficult.
Nick Johnson
Thanks again Nick. I think I'm heading down the path of modifying the GAE shell app a bit and passing it my problems and the student solutions as strings. I seem to be able to eval strings and check problems this way in shell.appspot.com without much difficulty. Thanks for the insight.
Chris