views:

314

answers:

7

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.

That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:

function isPure(functionMetadata): boolean;
begin
   result = true;
   for each variable in functionMetadata.variablesModified
      result = result and variable.isLocalToThisFunction;
   for each dependency in functionMetadata.functionsCalled
      result = result and isPure(dependency);
end;

That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.

Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.

Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.

Does anyone with a bit more knowledge in this area know what I'm missing?

+3  A: 

Here's the first thing that popped into my mind when I read your question.

Class Hierarchies

Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.

But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.

Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.

JaredPar
Oh, good point. Polymorphism would definitely complicate things. Although that could be fixed by putting a "pure constraint" on the base virtual function.
Mason Wheeler
+10  A: 

You also need to annotate every system call, every FFI, ...

And furthermore the tiniest 'leak' tends to leak into the whole code base.

It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.

As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.

And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.

Brian
Computer AI though is only 4-6 years away and sure it will be able to solve this problem for us :)
JaredPar
In Haskell every function is pure. `IO` doesn't change that. Well, maybe except of those ones that do `unsafePerformIO`.
Andrey Vlasovskikh
Anootating all the system class is that .NET is doing for their Code Contracts. When in doubt, it assumes the method is not pure.
Jonathan Allen
Jared: Hasn't computer AI been "only 4-6 years away" for the last 30 years or so? :P
Mason Wheeler
@Mason or maybe even 50 years or so %)
Vanya
@Mason - I think Jared is joking.
Chris Lutz
+5  A: 

It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.

Rafał Dowgird
Ouch! All right, I can see how that would make it more difficult.
Mason Wheeler
There's more - eval, setattr(), __gettattr()__ doing weird things, etc. Features like this make languages hard to analyze statically.
Rafał Dowgird
+4  A: 

In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.

Also, you could write a "pure" function that did modify variables.

Craig Stuntz
+1 Being side-effect free doesn't mean using only immutable values.
Andrey Vlasovskikh
A: 

I think the main problem would be doing it efficiently.

D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.

egon
A: 

Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.

Justice
I know about the halting problem, but why do you say it's equivalent?
Mason Wheeler
Suppose I write a program P which simulates the execution of some input function F (i.e. P is an interpreter). Suppose P is written such that it halts upon complete evaluation of F and immediately after executing any impure step of F (with an output indicating why it halted). Because some F might have the form `f a = f a` - Haskell syntax for a function with one argument that simply calls itself with the same argument recursively ad infinitum - there is some F for which P executing F will not halt. Thus, the OP's question is reducible to the halting problem.
Justice
Note that in Haskell, if we get rid of the language's loopholes permitting certain impure code inside otherwise pure code, the compiler can easily determine what code is for sure pure and what code is not for sure pure simply by looking at the types. Haskell represents impurity at the type level.
Justice
Generally speaking, pure functions in computing don't exclude non-terminating (or exceptionally terminating) functions for this very reason. Haskell actually defines nontermination to be a special kind of return value (the 'bottom') which effectively poisons any operation that attempts to inspect it.
bdonlan
A: 

Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl

proc myproc {a b} {
    if { $a > $b } {
        return $a
    } else {
        return $b
    }
}

Every single piece of that could be modified at any time. For example:

  • the "if" command could be rewritten to use and update global variables
  • the "return" command, along the same lines, could do the same thing
  • the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command

Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.

RHSeeger