tags:

views:

322

answers:

3

I've decided on PostSharp (indistinguishable from magic) to read attributes and memoize functions. The hash of the function call will be the key and the cached (in Velocity) result will be returned instead of calling the function again. Easy peasy, mac-and-cheesy.

I've already given up on being able to detect side effects in decorated functions, which turned out to be a "hard problem", even for the experts, which I am certainly not. Next, I've got to figure out what other functions are candidates for memoization.

  • What about methods that take complex reference types as parameters?
  • What about methods that depend on data inside of the instances they're called from?

ActiveRecord-esque data objects come to mind on that last one.

Am I going to have to refactor week-old code to support memoization?

+2  A: 

Well, any function, theoretically, is a candidate for memoization. However, remember that memoization is all about trading space for speed -

In general, this means that, the more state a function requires or depends on in order to compute an answer, the higher the space cost, which reduces the desirability to memoize the method.

Both of your examples are basically cases where more state would need to be saved. This has two side effects.

First, this will require a lot more memory space in order to memoize the function, since more information will need to be saved.

Second, this will potentially slow down the memoized function, since the larger the space, the higher the cost of the lookup of the answer, as well as the higher the cost in finding whether or not a result was previously saved.

In general, I tend to only consider functions that have few possible inputs and low storage requirements, unless there is a very high cost in calculating the answer.

I acknoledge that this is vague, but this is part of the "artistry" in architecture. There's no "right" answer without implementing both options (memozied and non-memoized functions), profiling, and measuring.

Reed Copsey
Can you point me to a document or library that discusses memoizing big-data functions? That is, functions with large parameters or required instance variables?
Chris McCall
I haven't seen any - although I've done this. The key to making this effective is having a very good (fast and low conflict) hash generation and using a Dictionary<parameters,result> on your required parameters. This makes the lookup and storage easier, but keeps it somewhat efficient. If you do that, it's pretty easy to memoize any function - just keep track of the information and the answer, and check your results.
Reed Copsey
...or, in the case of an instance method that requires data from inside it, Dictionary<instance, parameters, result>?
Chris McCall
Chris: I'd put all of the information into a custom "parameters" structure, including the instance info. You need to have value types for everything that the function requires, and a good, single hash code to work from there. This should include instance data + paramter data.
Reed Copsey
+1  A: 

You can only memoize a function if all of its inputs are value types or immutable reference types, if it returns either a value type or a new instance of a reference type, and if it has no side effects. Period.

Memoization depends on a deterministic mapping between inputs and outputs. Every call to F(a, b, c) in which a, b, and c contain the same values must return the same result in order for memoization to be possible.

If a parameter's a reference type, then even though its value doesn't change, multiple calls to the function using it may produce a different result. A trivial example:

public int MyFunction(MyType t)
{
   return t.Value;
}

Console.WriteLine(MyFunction(t));
t.Value++;
Console.WriteLine(MyFunction(t));

Similarly, if a function depends on a value external to it, then multiple calls to that function with the same parameters can return different results:

int Value = 0;

public int MyFunction(int input)
{
   return Value;
}

Console.WriteLine(MyFunction(1));
Value++;
Console.WriteLine(MyFunction(1));

And heaven help you if your memoized function does something other than return a value or a new reference type:

int Value = 0;

public int MyFunction(int input)
{
   Value++;
   return input;
}

If you call that function 10 times, Value will be 10. If you refactor it to use memoization and then call it 10 times, Value will be 1.

You can start going down the path of figuring out how to memoize state, so that you can phony up a function that memoizes a reference type. But what you're really memoizing is the set of values that the function works on. You can similarly hack a memoized function that has side effects so that its side effects occur before the memoization. But this is all begging for trouble.

If you want to implement memoization into a function that takes a reference type, the proper approach is to refactor out the part of the function that only works on value types, and memoize that function, e.g.:

public int MyFunction(MyType t)
{
   return t.Value + 1;
}

to this:

public int MyFunction(MyType t)
{
   return MyMemoizableFunction(t.Value);
}

private int MyMemoizableFunction(int value)
{
   return value + 1;
}

Any other approach to implementing memoization that you take either a) does the same thing, through more obscure means, or b) won't work.

Robert Rossney
This answer could use some clarification: you can certainly memoize a function if it deals only in *immutable* reference types. The downer is that there is no hard concept of an immutable reference type in .NET (yet). But it is certainly possible to construct them; hence while your answer does have a valid insight behind it, the opening sentence is basically untrue.
Daniel Earwicker
I'm gonna take this answer anyway. In my head, I changed "can" to "should" in the first sentence.
Chris McCall
Quite right. Also, I should have mentioned the issue of side effects in the first sentence, since memoizing a function that has side effects is a Very Bad Thing.
Robert Rossney
This isn't exactly true - you can memoize a function that works with reference types, even mutable reference types, as long as you only work with value types contained within those reference types. This is basically teh same as refactoring out the values from your references, but technically, it works fine. You need to make sure that your memoization takes this into account, though.
Reed Copsey
That's what I meant by "does the same thing, through more obscure means."
Robert Rossney
@Robert - at least remove the word "period"!
Daniel Earwicker
A: 

You already have thought of a way to provide an AOP solution to provide memoization around function Foo, so what is there left to figure out?

Yes, you can pass an object of arbitrary complexity as a parameter to a memoized function, as long as it is immutable, as are all the things it depends upon. Again, this is not at all easy to discover statically at the moment.

Are you still wedded to the idea that you can statically examine the code in order to advise your users on the question "Is it a good idea to apply memoization to function Foo?"

If you make that one of your requirements, you will be joining a global research effort that has lasted many years so far. Depends on how ambitious you are, I guess.

Daniel Earwicker