views:

623

answers:

12

There are a few automatic memoization libraries available on the internet for various different languages; but without knowing what they are for, where to use them, and how they work, it can be difficult to see their value. What are some convincing arguments for using memoization, and what problem domain does memoization especially shine in? Information for the uninformed would be especially appreciated here.

+10  A: 

Memoization shines in problems where solutions to subproblems can be reused. Speaking simply, it is a form of caching. Let's look at the factorial function as an example.

3! is a problem on it's own, but it's also a subproblem for n! where n > 3 such as 4! = 4 * 3! A function that calculates factorials can perform much better with memoization because it will only ever calculate 3! once and store the result internally in a hash table. Whenever it comes across 3! again it will look the value up in the table instead of recalculating it.

Any problem where subproblem solutions can be reused (the more frequently the better) is a candidate for using memoization.

Jason Punyon
Doubtful. I suspect your hashtable lookup will much more expensive than recalculating 3!.
Jonathan Allen
@Jonathan: What about 10!? 50!?
Jason Punyon
@Jonathan : Sure, 3! is faster to recompute. Is `20!`? Is `100!`? Is `200!`? I'll take `O(1)` anytime.
Stephen
On .NET using doubles the break-even point is 15!. For Int32 it is 35!. For Int64 it is a pitiful 7!.
Jonathan Allen
@Stephen, would you really? If you are using Int32 the largest number you can get is 16!. The break-even point for Int32 is 35!.
Jonathan Allen
@Jonathan : Thanks for bringing data, interesting to compare them - i assume the diff between `int32` and `int64` is because you're on a 32bit machine?
Stephen
@Jonathan : Yes. Yes I definitely would when I'm using BigNum libraries. Stop trying to trivialize the problem.
Stephen
BigNum has problems of its own. Without some sort of bounding constraint your hashtable will grow forever. In .NET this means every 2^Nth insertions is an O(n) operations and you risk heap framentation. Also, what about thread safty?
Jonathan Allen
@Jonathan : You have this insane idea that we're advocating memoization as the solution to all problems. Use bignum, no more integer overflow! Be serious. These are small focused problems. I (we) recognize the constraints of memoization and know we can't just infinitely grow a cache... but thanks for pointing it out.
Stephen
@Stephen. You say that, but yet your answer still have 3! as the example and no mention of the dangers that reckless caching can cause.
Jonathan Allen
@Jonathan Allen - such danger is alleviated by not using a memoisation implementation written by a drunken baboon, which has such advanced features as limiting the amount of data stored.
detly
The implementation has that restriction, I mean.
detly
+1  A: 

I use memoization all the time when migrating data from one system to another (ETL). The concept is that if a function will always return the same output for the same set of inputs, it may make sense to cache the result - especially if it takes awhile to calculate that result. When doing ETL, you're often repeating the same actions lots of times on lots of data, and performance is often critical. When performance isn't a concern or is negligible, it probably doesn't make sense to memoize your methods. Like anything, use the right tool for the job.

mattmc3
How do you deal with thread safty and infinite growth in your hash table?
Jonathan Allen
Thread safety is a non-issue, and even if you have millions and millions of data rows, you may only have a few thousand distinct combos to memoize in those millions of rows. In a properly set up ETL environment, you should have plenty of RAM anyway.
mattmc3
+2  A: 

Memoization is essentially caching the return value of a function for a given input. This is useful if you're going to repeat a function call many times with the same input, and especially so if the function takes some time to execute. Of course, since the data has to be stored somewhere, memoization is going to use more memory. It's a tradeoff between using CPU and using RAM.

Brian S
+11  A: 

Memoization is technique to store the answers to subproblems, so that a program does not need to re-solve the same sub-problems later.

It is an often an important technique in solving problems using Dynamic Programming.

Imagine enumerating all paths from the top-left corner of a grid to the bottom-right corner of a grid. A lot of the paths overlap each other. You can memoize the solutions calculated for each point on the grid, building from the bottom-right, back up to the top-left. This takes the computing time down from "ridiculous" to "tractable".

Another use is: List the factorials of the number 0 to 100. You do not want to calculate 100! using 100 * 99 * ... * 1. You already calculated 99!, so reuse that answer and simply multiply the 100 times the answer to 99!. You can memoize the answer at each of these steps (working from 1 up to 100) to save yourself significant amounts of computation.

For a data point, for my grid solving problem above (the problem is from a programming challenge):

Memoized:

real  0m3.128s
user  0m1.120s
sys   0m0.064s

Non-memoized (which I killed, because I was tired of waiting... so this is incomplete)

real 24m6.513s
user 23m52.478s
sys   0m6.040s
Stephen
If you have an m-by-n grid, assuming your path can only go down and right, there are (m+n-2)!/((m-1)!(n-1)!). Even if m and n are only around 10, there are around 50K paths. This is a lot of output! (If you allow lefts and ups, are an infinite number of paths, unless you disallow cycles.)
Paul Hanbury
Stephen
Sorry, if the answer is a number, I think that I gave already gave it away. I thought that you meant that you were printing out all of the paths.
Paul Hanbury
@Paul : Actually, your equation yields only 35M paths :)
Stephen
For what size grid?
Paul Hanbury
My rationale is as follows: There will be m-1 directions to move down and n-1 directions to move right in any path from top-left to bottom-right. If I have an urn containing m-1 balls labeled "right" and n-1 balls labeled "down", and I randomly selected balls from the urn without replacement. Following the directions on my grid, then I will have moved from the top-left to bottom-right when all of the balls are exhausted.
Paul Hanbury
Since there are m+n-2 combinations that the balls may have been selected, and m-1 and n-1 different orderings of "right" and "down" balls (respectively), there are (m+n-2)!/((m-1)!(n-1)!) different ways to draw the balls. This is the same number of paths from top-left to bottom-right.
Paul Hanbury
@Paul : I suck at probability, I'm not going to argue your reasoning. But for a 2x2 there are 6 paths. Your equation yields 2. For a 10x10, your solution gives 48620, but there are 184756.
Stephen
I see where we differ. You are following the edges of the Grid, I was moving from square to square.
Paul Hanbury
A: 

Memoization is just a fancy word for caching. If you calculations are more expensive than pulling the information from the cache then it is a good thing. The problem is that CPUs are fast and memory is slow. So I have found that using mmoization is usually much slower than just redoing the calculation.

Of course there are other techniques available that really do give you significant improvement. If I know that I need f(10) for every iteration of a loop, then I will store that in a variable. Since there is no cache look-up, this is usually a win.

EDIT

Go ahead and down vote me all you want. That won't change the fact that you need to do real benchmarking and not just blindly start throwing everything in hash tables.

If you know your range of values at compile time, say because you are using n! and n is a 32-bit int, then you will do better to use a static array.

If your range of values is large, say any double, then your hash table can grow so large that it becomes a serious problem.

If the same result is used over and over again in conjunction with a given object, then it may make sense to store that value with the object.

In my case I discovered that over 90% of the time the inputs for any given iteration was the same as the last iteration. That means I just needed to keep the last input and last result and only recalc if the input changed. This was an order of magnitude faster than using memoization for that alogrithm.

Jonathan Allen
I would think the the phrase "CPUs are fast and memory is slow" would fade into insignificance when you're talking about the factorial of 100. How do your speed comparisons rate then?
paxdiablo
If you just replace that bit with something like "it's more useful for intense calculations, less so for calculations that are fast", I'll reverse the -1 that someone gave you.
paxdiablo
Also, not everything that is memoized will automatically go onto memory, it may simply stay in the CPU as a register, depending on how you coded it.
Razor Storm
If I really needed 100! to be fast I wouldn't use normal Memoization techniques. Instead I would prepopulate an array with all the factorials that are inside my data type's range at compile time.
Jonathan Allen
@Jonathan, I don't think the point was that pax only needs 100!... I think the point was that if you need to frequently recompute factorial of large and unspecified values, then memoization is probably going to be the best way to get it done. Frankly speaking, if you're just going to have an array of pre-computed factorials, then why don't you just build an enormous (several MB) SQLite file of factorials and cache from there?
Lirik
@Link, because there aren't really that many factorials that fit in our normal data types.
Jonathan Allen
"CPUs are fast and memory is slow" — you're comparing apples and oranges. The amount of time a CPU will take to do a long calculation can be arbitrarily large, and the amount of space and memo-lookup-time required for that result can be minimal.
detly
"If you know your range of values at compile time, say because you are using n! and n is a 32-bit int, then you will do better to use a static array." — wrong, because knowing your **range** is not the same as knowing the **specific values** that are going to be used. These may be a tiny subset of the range for all normal usages of the program. If it *is* a tiny subset, a static array could be far worse than memoisation. And manually calculated static array of known results means on extra thing to maintain.
detly
-1 for statements based on the assumptions that the memoization algorithm is written by an idiot, and that people will use memoization to calculate fibonacci in production software.
quantumSoup
This is a horrible answer because it is misleading and false. In fact, nothing in it is true at all. The part of your answer closest to being true is "CPUs are fast and memory is slow", which is still false because there's nothing stopping anyone from building a computer where CPU time is more expensive than memory operations. It's disappointing to see an answer like this from someone with so much rep...
Cam
You bash memoization and then offer a "better" solution which is in turn, the definition of what memoization is... What are you trying to say? "Down vote me for all you want, but that won't change the fact that memoization sucks and that you should do memoization instead" ??
Razor Storm
+4  A: 

Memoization can radically speed up algorithms. The classic example is the Fibonocci series, where the recursive algorithm is insanely slow, but memoization automatically makes it as fast as the iterative version.

Steven Sudit
Statically compiling it in an array is even faster.
Jonathan Allen
@Jonathan: You enjoy trolling, don't you?
Chris Charabaruk
+5  A: 

Memoization exchanges time for space.

Memoization can turn exponential time (or worse) into linear time (or better) when applied to problems that are multiple-recursive in nature. The cost is generally O(n) space.

The classic example is computing the Fibonacci sequence. The textbook definition is the recurrence relation:

F(n) = F(n-1) + F(n-2)

Implemented naively, it looks like this:

int fib(int n) {
  if (n == 0) {
    return 0;
  }
  else if (n == 1) {
    return 1;
  }
  else {
    return fib(n-1) + fib(n-2);
  }
}

You can see that the runtime grows exponentially with n because each of the partial sums is computed multiple times.

Implemented with memoization, it looks like this (clumsy but functional):

int fib(int n) {
  static bool initialized = false;
  static std::vector<int> memo;

  if (!initialized) {
    memo.push_back(0);
    memo.push_back(1);
    initialized = true;
  }

  if (memo.size() > n) {
    return memo[n];
  }
  else {
    const int val = fib(n-1) + fib(n-2);
    memo.push_back(val);
    return val;
  }
}

Timing these two implementations on my laptop, for n = 42, the naive version takes 6.5 seconds. The memoized version takes 0.005 seconds (all system time--that is, it's I/O bound). For n = 50, the memoized version still takes 0.005 seconds, and the naive version finally finished after 5 minutes & 7 seconds (never mind that both of them overflowed a 32-bit integer).

Drew Hall
Care to explain the downvote?
Drew Hall
@Jonathan : Your personality sucks.
Stephen
@Jonathan: It's a simple example. For production code, you'd go to BigInts and add a wrapper with a one-time check for n > 0, but other than that I don't see a problem. Obviously you wouldn't use the naive version for long (you'd discover memoization yourself pretty quickly, I imagine). Lighten up...
Drew Hall
+16  A: 

In my opinion, Fibonacci and factorial calculations are not really the best examples. Memoisation really comes into into its own when you have:

  1. A huge range of potential inputs to the calculation in question, but the range is still clearly restricted and known
  2. You know ahead of time that any actual use of the program will only use a small subset of possible inputs to your calculation (Fibonacci and factorial fail this)
  3. You don't know which particular inputs will be used at runtime, and therefore which particular results will need to be memoised (Fibonacci and factorial fail this too, up to a point)

Obviously if you do know all possible inputs, and space allows, you can consider replacing the function with a lookup (which is I'd do for, say, an embedded CRC32 implementation with a known generator).

...even better than #2 is if for any particular run of the program, you can immediately say "the range of potential inputs will be restricted to a subset satisfying these conditions...".

Note that a lot of this might be probabilistic (or intuitive) — sure, someone might try all of the 10^13 possible inputs to your magic calculation, but you know that realistically they won't. If they do, the overhead of memoisation will actually be of no benefit to them. But you may well decide that this is acceptable, or allow bypassing the memoisation in such circumstances.


Here's an example, and I hope it's not too convoluted (or generalised) to be informative.

In some firmware I've written, one part of the program takes a read from an ADC, which could be any number from 0x000 to 0xFFF and calculates an output for some other part of the program. This calculation also takes a set of user-tuneable numbers, but these are only read at program startup. This calculation is quite a hit the first time it's run.

Creating a lookup table ahead of time is ridiculous. The input domain is the Cartesian product of [0x000, ..., 0xFFF] and (a large range of floating point values) and (another large range...) and ... No thanks.

But no user requires or expects the device to work well when conditions change rapidly, and they'd much rather it works better when things are steady. So I make a tradeoff in computational behaviour that reflects these requirements: I want this calculation to be nice and fast when things are stable, and I don't care about when they aren't.

Given the definition of "slowly changing conditions" that the typical user expects, that ADC value is going settle to a particular value and remain within about 0x010 of its settled value. Which value depends on the conditions.

The result of the calculation can therefore be memoised for these 16 potential inputs. If environmental conditions do change faster than expected, the "furthest" ADC read from the most recent is discarded (eg. if I've cached 0x210 to 0x21F, and then I read 0x222, I drop the 0x210 result).

The drawback here is that if environmental conditions change a lot, that already-slow calculation runs a little slower. We've already established that this is an unusual use-case, but if someone later reveals that actually, they do want to operate it under unusually unstable conditions, I can implement a way to bypass the memoisation.

detly
+1 for real-world guidance.
Drew Hall
I've added an example now, too.
detly
+1 Nice example.
Robert Harvey
You can do better with a least recently used cache. It'll be about the same as what you have now, except that the 16 can be any 16.
DeadMG
@DeadMG - Presumably that would require dynamically allocated memory? I tend to avoid that for embedded systems. :)
detly
@detly: No, not at all. You could use a statically allocated array. Infaact, wherever you were going to store your memoize results will do.
DeadMG
@DeadMG - aha, I just looked it up :P I see what you mean now. That's a useful thing to know, cheers.
detly
+3  A: 

One of the uses for a form of memoization is in game tree analysis. In the analysis of non-trivial game trees (think chess, go, bridge) calculating the value of a position is a non-trivial task and can take significant time. A naive implementation will simply use this result and then discard it but all strong players will store it and use it should the situation arise again. You can imagine that in chess there are countless ways of reaching the same position.

To achieve this in practise requires endless experimentation and tuning but it is safe to say that computer chess programs would not be what they are today without this technique.

In AI the use of such memoization is usually referred to as a 'transposition table'.

Daniel
+1  A: 

I think mostly everybody has covered the basics of memoization, but I'll give you some practical examples where moization can be used to do some pretty amazing things (imho):

  1. In C# you can reflect a function and create a delegate for it, then you can dynamically invoke the delegate... but this is REALLY slow! It's about 30 times slower than invoking the method directly. If you memoize the method invocation, then you can make the invocation nearly as fast as invoking the method directly.
  2. In Genetic Programming it can reduce the overhead of repeatedly calling the same function with the similar input parameters for hundreds and thousands of specimens in the population.
  3. In the execution of expression trees: you don't have to keep re-evaluation the expression tree if you've already memoized it...

Of course there are many more practical examples where memoization can be used, but these are just a few.

In my blog I discuss memoization and reflection separately, but I'm going to post another article about using memoization on reflected methods...

Lirik
+6  A: 

The popular factorial answer here is something of a toy answer. Yes, memoization is useful for repeated invocations of that function, but the relationship is trivial — in the "print factorial(N) for 0..M" case you're simply reusing the last value.

Many of the other examples here are just 'caching'. Which is useful, but it ignores the awesome algorithmic implications that the word memoization carries for me.

Far more interesting are cases where different branches of single invocation of a recursive function hits identical sub-problems but in a non-trivial pattern such that actually indexing into some cache is actually useful.

For example, consider n dimensional arrays of integers whos absolute values sum to k. E.g. for n=3,k=5 [1,-4,0], [3,-1,1], [5,0,0], [0,5,0] would be some examples. Let V(n,k) be the number of possible unique arrays for a given n,k. Its definition is:

V(n,0)=1; V(0,k)=0; V(n,k) = V(n-1,k) + V(n,k-1) + V(n-1,k-1);

This function gives 102 for n=3,k=5.

Without memoization this quickly becomes very slow to compute for even fairly modest numbers. If you visualize the processing as a tree, each node an invocation of V() expanding to three children you'd have 186,268,135,991,213,676,920,832 V(n,0)=1 leaves in the computation of V(32,32)... Implemented naively this function rapidly becomes uncomputable on available hardware.

But many of the child branches in the tree are exact duplicates of each other though not in some trivial way that could easily be eliminated like the factorial function. With memoization we can merge all those duplicate branches. In fact, with memoization V(32,32) only executes V() 1024 (n*m) times which is a speedup of a factor of 10^21 (which gets larger as n,k grows, obviously) or so in exchange for a fairly small amount of memory. :) I find this kind of fundamental change to the complexity of an algorithm far more exciting than simple caching. It can make intractable problems easy.

Because python numbers are naturally bignums you can implement this formula in python with memoization using a dictionary and tuple keys in only 9 lines. Give it a shot and try it without the memoization.

Gmaxwell
This is a great answer!
Drew Hall
You say "Many of the other algorithms here are just caching", then "Far more interesting is ..." and you go on to describe... caching? Then you describe an example sequence similar to fibonacci (or my grid problem), after calling the other answers 'toys' (including those that use fibonacci). Glad you got your answer accepted, but I personally don't think you brought anything new to the table. :)
Stephen
Eh, sorry for not making the distinction more clear. The fibonacci answer wasn't up when I started writing and you didn't actually explain the your grid example in a way someone could try. Otherwise I probably wouldn't have responded. :) Though the number #1 voted answer is, in my view, still not the most interesting or accurate.The factorial really is a toy and I think I justified that claim. My view on the distinction between 'caching' and memoization is the complexity change— the same distinction you draw with '"ridiculous" to "tractable"'. Cheers.
Gmaxwell
This answer was accepted because it directly relates (remembering past "branch-answers" in recursion) to the algorithm it was being applied to and demonstrated here: http://stackoverflow.com/questions/3242597/what-is-memoization-good-for-and-is-it-really-all-that-helpful/3276775#3276775
Noctis Skytower
@Gmaxwell : Maybe I was in a bad mood when I woke up today and the "more interesting than caching" set me off. Your answer did go into more detail on the impact on complexity, which was good - mine was hand-wavy. I see the distinction you're trying to make now. Your example does show it better than factorial (and I just assumed you saw the `fib()` example, since your post is like 4h later). +1 :)
Stephen
A: 

As an example of how to use memoization to boost an algorithm's performance, the following runs roughly 300 times faster for this particular test case. Before, it took ~200 seconds; 2/3 memoized.


class Slice:

    __slots__ = 'prefix', 'root', 'suffix'

    def __init__(self, prefix, root, suffix):
        self.prefix = prefix
        self.root = root
        self.suffix = suffix

################################################################################

class Match:

    __slots__ = 'a', 'b', 'prefix', 'suffix', 'value'

    def __init__(self, a, b, prefix, suffix, value):
        self.a = a
        self.b = b
        self.prefix = prefix
        self.suffix = suffix
        self.value = value

################################################################################

class Tree:

    __slots__ = 'nodes', 'index', 'value'

    def __init__(self, nodes, index, value):
        self.nodes = nodes
        self.index = index
        self.value = value

################################################################################

def old_search(a, b):
    # Initialize startup variables.
    nodes, index = [], []
    a_size, b_size = len(a), len(b)
    # Begin to slice the sequences.
    for size in range(min(a_size, b_size), 0, -1):
        for a_addr in range(a_size - size + 1):
            # Slice "a" at address and end.
            a_term = a_addr + size
            a_root = a[a_addr:a_term]
            for b_addr in range(b_size - size + 1):
                # Slice "b" at address and end.
                b_term = b_addr + size
                b_root = b[b_addr:b_term]
                # Find out if slices are equal.
                if a_root == b_root:
                    # Create prefix tree to search.
                    a_pref, b_pref = a[:a_addr], b[:b_addr]
                    p_tree = old_search(a_pref, b_pref)
                    # Create suffix tree to search.
                    a_suff, b_suff = a[a_term:], b[b_term:]
                    s_tree = old_search(a_suff, b_suff)
                    # Make completed slice objects.
                    a_slic = Slice(a_pref, a_root, a_suff)
                    b_slic = Slice(b_pref, b_root, b_suff)
                    # Finish the match calculation.
                    value = size + p_tree.value + s_tree.value
                    match = Match(a_slic, b_slic, p_tree, s_tree, value)
                    # Append results to tree lists.
                    nodes.append(match)
                    index.append(value)
        # Return largest matches found.
        if nodes:
            return Tree(nodes, index, max(index))
    # Give caller null tree object.
    return Tree(nodes, index, 0)

################################################################################

def search(memo, a, b):
    # Initialize startup variables.
    nodes, index = [], []
    a_size, b_size = len(a), len(b)
    # Begin to slice the sequences.
    for size in range(min(a_size, b_size), 0, -1):
        for a_addr in range(a_size - size + 1):
            # Slice "a" at address and end.
            a_term = a_addr + size
            a_root = a[a_addr:a_term]
            for b_addr in range(b_size - size + 1):
                # Slice "b" at address and end.
                b_term = b_addr + size
                b_root = b[b_addr:b_term]
                # Find out if slices are equal.
                if a_root == b_root:
                    # Create prefix tree to search.
                    key = a_pref, b_pref = a[:a_addr], b[:b_addr]
                    if key not in memo:
                        memo[key] = search(memo, a_pref, b_pref)
                    p_tree = memo[key]
                    # Create suffix tree to search.
                    key = a_suff, b_suff = a[a_term:], b[b_term:]
                    if key not in memo:
                        memo[key] = search(memo, a_suff, b_suff)
                    s_tree = memo[key]
                    # Make completed slice objects.
                    a_slic = Slice(a_pref, a_root, a_suff)
                    b_slic = Slice(b_pref, b_root, b_suff)
                    # Finish the match calculation.
                    value = size + p_tree.value + s_tree.value
                    match = Match(a_slic, b_slic, p_tree, s_tree, value)
                    # Append results to tree lists.
                    nodes.append(match)
                    index.append(value)
        # Return largest matches found.
        if nodes:
            return Tree(nodes, index, max(index))
    # Give caller null tree object.
    return Tree(nodes, index, 0)

################################################################################

import time
a = tuple(range(50))
b = (48, 11, 5, 22, 28, 31, 14, 18, 7, 29, 49, 44, 47, 36, 25, 27,
     34, 10, 38, 15, 21, 16, 35, 20, 45, 2, 37, 33, 6, 30, 0, 8, 13,
     43, 32, 1, 40, 26, 24, 42, 39, 9, 12, 17, 46, 4, 23, 3, 19, 41)

start = time.clock()
old_search(a, b)
stop = time.clock()

print('old_search() =', stop - start)

start = time.clock()
search({}, a, b)
stop = time.clock()

print('search() =', stop - start)

Reference: How can memoization be applied to this algorithm?

Noctis Skytower