There are a few automatic memoization libraries available on the internet for various different languages; but without knowing what they are for, where to use them, and how they work, it can be difficult to see their value. What are some convincing arguments for using memoization, and what problem domain does memoization especially shine in? Information for the uninformed would be especially appreciated here.
Memoization shines in problems where solutions to subproblems can be reused. Speaking simply, it is a form of caching. Let's look at the factorial function as an example.
3! is a problem on it's own, but it's also a subproblem for n! where n > 3 such as 4! = 4 * 3!
A function that calculates factorials can perform much better with memoization because it will only ever calculate 3! once and store the result internally in a hash table. Whenever it comes across 3! again it will look the value up in the table instead of recalculating it.
Any problem where subproblem solutions can be reused (the more frequently the better) is a candidate for using memoization.
I use memoization all the time when migrating data from one system to another (ETL). The concept is that if a function will always return the same output for the same set of inputs, it may make sense to cache the result - especially if it takes awhile to calculate that result. When doing ETL, you're often repeating the same actions lots of times on lots of data, and performance is often critical. When performance isn't a concern or is negligible, it probably doesn't make sense to memoize your methods. Like anything, use the right tool for the job.
Memoization is essentially caching the return value of a function for a given input. This is useful if you're going to repeat a function call many times with the same input, and especially so if the function takes some time to execute. Of course, since the data has to be stored somewhere, memoization is going to use more memory. It's a tradeoff between using CPU and using RAM.
Memoization is technique to store the answers to subproblems, so that a program does not need to re-solve the same sub-problems later.
It is an often an important technique in solving problems using Dynamic Programming.
Imagine enumerating all paths from the top-left corner of a grid to the bottom-right corner of a grid. A lot of the paths overlap each other. You can memoize the solutions calculated for each point on the grid, building from the bottom-right, back up to the top-left. This takes the computing time down from "ridiculous" to "tractable".
Another use is: List the factorials of the number 0 to 100. You do not want to calculate 100! using 100 * 99 * ... * 1
. You already calculated 99!
, so reuse that answer and simply multiply the 100
times the answer to 99!
. You can memoize the answer at each of these steps (working from 1 up to 100) to save yourself significant amounts of computation.
For a data point, for my grid solving problem above (the problem is from a programming challenge):
Memoized:
real 0m3.128s
user 0m1.120s
sys 0m0.064s
Non-memoized (which I killed, because I was tired of waiting... so this is incomplete)
real 24m6.513s
user 23m52.478s
sys 0m6.040s
Memoization is just a fancy word for caching. If you calculations are more expensive than pulling the information from the cache then it is a good thing. The problem is that CPUs are fast and memory is slow. So I have found that using mmoization is usually much slower than just redoing the calculation.
Of course there are other techniques available that really do give you significant improvement. If I know that I need f(10) for every iteration of a loop, then I will store that in a variable. Since there is no cache look-up, this is usually a win.
EDIT
Go ahead and down vote me all you want. That won't change the fact that you need to do real benchmarking and not just blindly start throwing everything in hash tables.
If you know your range of values at compile time, say because you are using n! and n is a 32-bit int, then you will do better to use a static array.
If your range of values is large, say any double, then your hash table can grow so large that it becomes a serious problem.
If the same result is used over and over again in conjunction with a given object, then it may make sense to store that value with the object.
In my case I discovered that over 90% of the time the inputs for any given iteration was the same as the last iteration. That means I just needed to keep the last input and last result and only recalc if the input changed. This was an order of magnitude faster than using memoization for that alogrithm.
Memoization can radically speed up algorithms. The classic example is the Fibonocci series, where the recursive algorithm is insanely slow, but memoization automatically makes it as fast as the iterative version.
Memoization exchanges time for space.
Memoization can turn exponential time (or worse) into linear time (or better) when applied to problems that are multiple-recursive in nature. The cost is generally O(n) space.
The classic example is computing the Fibonacci sequence. The textbook definition is the recurrence relation:
F(n) = F(n-1) + F(n-2)
Implemented naively, it looks like this:
int fib(int n) {
if (n == 0) {
return 0;
}
else if (n == 1) {
return 1;
}
else {
return fib(n-1) + fib(n-2);
}
}
You can see that the runtime grows exponentially with n because each of the partial sums is computed multiple times.
Implemented with memoization, it looks like this (clumsy but functional):
int fib(int n) {
static bool initialized = false;
static std::vector<int> memo;
if (!initialized) {
memo.push_back(0);
memo.push_back(1);
initialized = true;
}
if (memo.size() > n) {
return memo[n];
}
else {
const int val = fib(n-1) + fib(n-2);
memo.push_back(val);
return val;
}
}
Timing these two implementations on my laptop, for n = 42, the naive version takes 6.5 seconds. The memoized version takes 0.005 seconds (all system time--that is, it's I/O bound). For n = 50, the memoized version still takes 0.005 seconds, and the naive version finally finished after 5 minutes & 7 seconds (never mind that both of them overflowed a 32-bit integer).
In my opinion, Fibonacci and factorial calculations are not really the best examples. Memoisation really comes into into its own when you have:
- A huge range of potential inputs to the calculation in question, but the range is still clearly restricted and known
- You know ahead of time that any actual use of the program will only use a small subset of possible inputs to your calculation (Fibonacci and factorial fail this)
- You don't know which particular inputs will be used at runtime, and therefore which particular results will need to be memoised (Fibonacci and factorial fail this too, up to a point)
Obviously if you do know all possible inputs, and space allows, you can consider replacing the function with a lookup (which is I'd do for, say, an embedded CRC32 implementation with a known generator).
...even better than #2 is if for any particular run of the program, you can immediately say "the range of potential inputs will be restricted to a subset satisfying these conditions...".
Note that a lot of this might be probabilistic (or intuitive) — sure, someone might try all of the 10^13 possible inputs to your magic calculation, but you know that realistically they won't. If they do, the overhead of memoisation will actually be of no benefit to them. But you may well decide that this is acceptable, or allow bypassing the memoisation in such circumstances.
Here's an example, and I hope it's not too convoluted (or generalised) to be informative.
In some firmware I've written, one part of the program takes a read from an ADC, which could be any number from 0x000
to 0xFFF
and calculates an output for some other part of the program. This calculation also takes a set of user-tuneable numbers, but these are only read at program startup. This calculation is quite a hit the first time it's run.
Creating a lookup table ahead of time is ridiculous. The input domain is the Cartesian product of [0x000
, ..., 0xFFF
] and (a large range of floating point values) and (another large range...) and ... No thanks.
But no user requires or expects the device to work well when conditions change rapidly, and they'd much rather it works better when things are steady. So I make a tradeoff in computational behaviour that reflects these requirements: I want this calculation to be nice and fast when things are stable, and I don't care about when they aren't.
Given the definition of "slowly changing conditions" that the typical user expects, that ADC value is going settle to a particular value and remain within about 0x010 of its settled value. Which value depends on the conditions.
The result of the calculation can therefore be memoised for these 16 potential inputs. If environmental conditions do change faster than expected, the "furthest" ADC read from the most recent is discarded (eg. if I've cached 0x210 to 0x21F, and then I read 0x222, I drop the 0x210 result).
The drawback here is that if environmental conditions change a lot, that already-slow calculation runs a little slower. We've already established that this is an unusual use-case, but if someone later reveals that actually, they do want to operate it under unusually unstable conditions, I can implement a way to bypass the memoisation.
One of the uses for a form of memoization is in game tree analysis. In the analysis of non-trivial game trees (think chess, go, bridge) calculating the value of a position is a non-trivial task and can take significant time. A naive implementation will simply use this result and then discard it but all strong players will store it and use it should the situation arise again. You can imagine that in chess there are countless ways of reaching the same position.
To achieve this in practise requires endless experimentation and tuning but it is safe to say that computer chess programs would not be what they are today without this technique.
In AI the use of such memoization is usually referred to as a 'transposition table'.
I think mostly everybody has covered the basics of memoization, but I'll give you some practical examples where moization can be used to do some pretty amazing things (imho):
- In C# you can reflect a function and create a delegate for it, then you can dynamically invoke the delegate... but this is REALLY slow! It's about 30 times slower than invoking the method directly. If you memoize the method invocation, then you can make the invocation nearly as fast as invoking the method directly.
- In Genetic Programming it can reduce the overhead of repeatedly calling the same function with the similar input parameters for hundreds and thousands of specimens in the population.
- In the execution of expression trees: you don't have to keep re-evaluation the expression tree if you've already memoized it...
Of course there are many more practical examples where memoization can be used, but these are just a few.
In my blog I discuss memoization and reflection separately, but I'm going to post another article about using memoization on reflected methods...
The popular factorial answer here is something of a toy answer. Yes, memoization is useful for repeated invocations of that function, but the relationship is trivial — in the "print factorial(N) for 0..M" case you're simply reusing the last value.
Many of the other examples here are just 'caching'. Which is useful, but it ignores the awesome algorithmic implications that the word memoization carries for me.
Far more interesting are cases where different branches of single invocation of a recursive function hits identical sub-problems but in a non-trivial pattern such that actually indexing into some cache is actually useful.
For example, consider n dimensional arrays of integers whos absolute values sum to k. E.g. for n=3,k=5 [1,-4,0], [3,-1,1], [5,0,0], [0,5,0] would be some examples. Let V(n,k) be the number of possible unique arrays for a given n,k. Its definition is:
V(n,0)=1; V(0,k)=0; V(n,k) = V(n-1,k) + V(n,k-1) + V(n-1,k-1);
This function gives 102 for n=3,k=5.
Without memoization this quickly becomes very slow to compute for even fairly modest numbers. If you visualize the processing as a tree, each node an invocation of V() expanding to three children you'd have 186,268,135,991,213,676,920,832 V(n,0)=1 leaves in the computation of V(32,32)... Implemented naively this function rapidly becomes uncomputable on available hardware.
But many of the child branches in the tree are exact duplicates of each other though not in some trivial way that could easily be eliminated like the factorial function. With memoization we can merge all those duplicate branches. In fact, with memoization V(32,32) only executes V() 1024 (n*m) times which is a speedup of a factor of 10^21 (which gets larger as n,k grows, obviously) or so in exchange for a fairly small amount of memory. :) I find this kind of fundamental change to the complexity of an algorithm far more exciting than simple caching. It can make intractable problems easy.
Because python numbers are naturally bignums you can implement this formula in python with memoization using a dictionary and tuple keys in only 9 lines. Give it a shot and try it without the memoization.
As an example of how to use memoization to boost an algorithm's performance, the following runs roughly 300 times faster for this particular test case. Before, it took ~200 seconds; 2/3 memoized.
class Slice:
__slots__ = 'prefix', 'root', 'suffix'
def __init__(self, prefix, root, suffix):
self.prefix = prefix
self.root = root
self.suffix = suffix
################################################################################
class Match:
__slots__ = 'a', 'b', 'prefix', 'suffix', 'value'
def __init__(self, a, b, prefix, suffix, value):
self.a = a
self.b = b
self.prefix = prefix
self.suffix = suffix
self.value = value
################################################################################
class Tree:
__slots__ = 'nodes', 'index', 'value'
def __init__(self, nodes, index, value):
self.nodes = nodes
self.index = index
self.value = value
################################################################################
def old_search(a, b):
# Initialize startup variables.
nodes, index = [], []
a_size, b_size = len(a), len(b)
# Begin to slice the sequences.
for size in range(min(a_size, b_size), 0, -1):
for a_addr in range(a_size - size + 1):
# Slice "a" at address and end.
a_term = a_addr + size
a_root = a[a_addr:a_term]
for b_addr in range(b_size - size + 1):
# Slice "b" at address and end.
b_term = b_addr + size
b_root = b[b_addr:b_term]
# Find out if slices are equal.
if a_root == b_root:
# Create prefix tree to search.
a_pref, b_pref = a[:a_addr], b[:b_addr]
p_tree = old_search(a_pref, b_pref)
# Create suffix tree to search.
a_suff, b_suff = a[a_term:], b[b_term:]
s_tree = old_search(a_suff, b_suff)
# Make completed slice objects.
a_slic = Slice(a_pref, a_root, a_suff)
b_slic = Slice(b_pref, b_root, b_suff)
# Finish the match calculation.
value = size + p_tree.value + s_tree.value
match = Match(a_slic, b_slic, p_tree, s_tree, value)
# Append results to tree lists.
nodes.append(match)
index.append(value)
# Return largest matches found.
if nodes:
return Tree(nodes, index, max(index))
# Give caller null tree object.
return Tree(nodes, index, 0)
################################################################################
def search(memo, a, b):
# Initialize startup variables.
nodes, index = [], []
a_size, b_size = len(a), len(b)
# Begin to slice the sequences.
for size in range(min(a_size, b_size), 0, -1):
for a_addr in range(a_size - size + 1):
# Slice "a" at address and end.
a_term = a_addr + size
a_root = a[a_addr:a_term]
for b_addr in range(b_size - size + 1):
# Slice "b" at address and end.
b_term = b_addr + size
b_root = b[b_addr:b_term]
# Find out if slices are equal.
if a_root == b_root:
# Create prefix tree to search.
key = a_pref, b_pref = a[:a_addr], b[:b_addr]
if key not in memo:
memo[key] = search(memo, a_pref, b_pref)
p_tree = memo[key]
# Create suffix tree to search.
key = a_suff, b_suff = a[a_term:], b[b_term:]
if key not in memo:
memo[key] = search(memo, a_suff, b_suff)
s_tree = memo[key]
# Make completed slice objects.
a_slic = Slice(a_pref, a_root, a_suff)
b_slic = Slice(b_pref, b_root, b_suff)
# Finish the match calculation.
value = size + p_tree.value + s_tree.value
match = Match(a_slic, b_slic, p_tree, s_tree, value)
# Append results to tree lists.
nodes.append(match)
index.append(value)
# Return largest matches found.
if nodes:
return Tree(nodes, index, max(index))
# Give caller null tree object.
return Tree(nodes, index, 0)
################################################################################
import time
a = tuple(range(50))
b = (48, 11, 5, 22, 28, 31, 14, 18, 7, 29, 49, 44, 47, 36, 25, 27,
34, 10, 38, 15, 21, 16, 35, 20, 45, 2, 37, 33, 6, 30, 0, 8, 13,
43, 32, 1, 40, 26, 24, 42, 39, 9, 12, 17, 46, 4, 23, 3, 19, 41)
start = time.clock()
old_search(a, b)
stop = time.clock()
print('old_search() =', stop - start)
start = time.clock()
search({}, a, b)
stop = time.clock()
print('search() =', stop - start)
Reference: How can memoization be applied to this algorithm?