views:

47

answers:

1

The title more or less says it all:

I have a function which takes symmetric input in two arguments, e.g. something like

def f(a1, a2):
    return heavy_stuff(abs(a1 - a2))

Now, I want to introduce some caching method. Would it be correct / pythonic / reasonably efficient to do something like this:

cache = {}
def g(a1, a2):
    fs =frozenset((tuple(a1), tuple(a2)))
    if fs not in cache:
        cache[fs] = f(a1, a2)
    return cache[fs]

Or would there be some better way?

Edit: a1 and a2 might be the rows of a numpy array; that’s why I wrap them in a tuple each.

+2  A: 

Python always computes all arguments you're passing to a function, and only then does it call the function. In other words, like most other languages, Python is "eager" in its evaluation (the major exception today is probably Haskell, but that doesn't help you;-).

So setdefault is a very unsuitable approach for caching! Whenever you do

cache.setdefault(akey, f(x, y))

you're first calling f(x, y), with all its computational cost, then maybe dropping the results of that computation on the floor; this makes the caching totally ineffective.

Rather, always do it as follows:

akey = whatever(x, y)
if akey not in cache:
    cache[akey] = f(x, y)
return cache[akey]

or the like -- there are a few other possible idioms, especially if e.g. you know that f will never return None:

result = cache.get(akey)
if result is None:
    result = cache[akey] = f(x, y)
return result

As for the secondary issue of what's an appropriate whatever for the key computation, given that you know that f is symmetrical, I think frozenset is probably OK; although (if the components of x and y are comparable, as well as hashable -- i.e. it wouldn't work with complex numbers) you might want to consider

ta1 = tuple(a1)
ta2 = tuple(a2)
if ta1 > ta2: key = ta1, ta2
else: key = ta2, ta1

the relative performance depends on the cost of comparing, vs that of hashing, the items in a1 and a2. Differences are likely to be minor, anyway.

Alex Martelli
Oh, thanks. I seem to have forgot about that; I just discovered setdefault and it just seemed convenient. Going to change it in the question.
Debilski
@Debilski, yes, `setdefault` is often an "attractive nuisance"... it _seems_ convenient, but can end up sapping your performance!-)
Alex Martelli
Hmm. Sorting the input looks good as well. `key = tuple(sorted((ta1, ta2)))`. I’ll keep that in mind.
Debilski