Scenario: a program running in a byte code virtual machine like Java or Python, wants to evaluate (by compiling on the fly to byte code and then running) a function whose code was automatically generated or supplied from outside.
The tricky bit is that the function's code is not trusted -- it may have been generated by a stochastic method like genetic programming, or even supplied by an adversary. Therefore it is desired to enforce that it behaves as a pure function -- it can return a value, but it may not have any side effects, i.e. it may not alter any of the program's existing data in any way.
Another tricky bit is that the function may have a legitimate need to call some of the program's existing functions; some of these functions may have side effects, but these should be prevented from actually having any lasting effect as long as they are being called by the suspect function.
Also, it is preferable that no constraint be placed on the coding style of the suspect function, e.g. it is free to perform destructive updates on any data structures that it creates itself, only its overall effect is required to be purely functional.
Furthermore, it is preferable that the solution have reasonably low overhead, because this may need to be done billions of times; it would be better, for example, to avoid having to fork a whole new virtual machine for each such function.
This doesn't necessarily have to be doable in an existing virtual machine like Java or Python; if it is necessary to design a virtual machine around this use case, then so be it.
Are there any already known solutions (or non-solutions, i.e. things that are known not to work) for this problem?