views:

64

answers:

3

Scenario: a program running in a byte code virtual machine like Java or Python, wants to evaluate (by compiling on the fly to byte code and then running) a function whose code was automatically generated or supplied from outside.

The tricky bit is that the function's code is not trusted -- it may have been generated by a stochastic method like genetic programming, or even supplied by an adversary. Therefore it is desired to enforce that it behaves as a pure function -- it can return a value, but it may not have any side effects, i.e. it may not alter any of the program's existing data in any way.

Another tricky bit is that the function may have a legitimate need to call some of the program's existing functions; some of these functions may have side effects, but these should be prevented from actually having any lasting effect as long as they are being called by the suspect function.

Also, it is preferable that no constraint be placed on the coding style of the suspect function, e.g. it is free to perform destructive updates on any data structures that it creates itself, only its overall effect is required to be purely functional.

Furthermore, it is preferable that the solution have reasonably low overhead, because this may need to be done billions of times; it would be better, for example, to avoid having to fork a whole new virtual machine for each such function.

This doesn't necessarily have to be doable in an existing virtual machine like Java or Python; if it is necessary to design a virtual machine around this use case, then so be it.

Are there any already known solutions (or non-solutions, i.e. things that are known not to work) for this problem?

+1  A: 

I'd imagine sandboxing is your only option. Trying to analyze the program and determine if it is safe is a halting problem equivalent. The CLR has security built in that allow restrictions like this, I imagine Java has similar ones. I don't think Python does.

James Deville
+1  A: 

Well, the general problem seems to be unsolveable: there isn't a terminating strategy for sorting inherently stateful computations from the possibly state-free. Unless the byte code is especially constructed to provide the kind strong type constraints required to overcome this, then you'll be lost. Curien has written much about what kind of things can and can't be inferred from black box observations.

But if you are willing to demand more from your function provider, the question is begging for proof-carrying code (PCC) as an answer. I'm guessing you are aware of the work of Necula, who was particularly concerned with ensuring that assembly code respected constraints on memory usage, such as not tampering with out-of-scope state; you might not be aware of the work done on automatic inference of proofs in fairly common cases: it may be the case that PCC is an easier option than you think.

Charles Stewart
+1  A: 

I, and many others have previously constructed languages for genetic programming purposes. If constructing a new language is an option, then such solutions already exists. Since there are techniques for automatic function generation, it should be trivial to provide function libraries. This system will, in effect constitute a sand-boxed environment. Any side effects of these functions will be limited to the program accessible space.

Killroy