ansaurus

Question

Caching the results of a function with two parameters in Python

Answer 1

A:

You can either combine the two values into one hash value like str(a)+'\0'+str(b) and then put it into a cache, or you can make a two-dimensional array so that cache[a][b] returns the value you want.

You may also want to turn this functionality into a @decorator kind of function, then you can reuse the decorator on several functions and maintain one code location for the caching.

eruciform 2010-07-08 19:47:15

I like the idea with the decorator. But the one with the combined hash does not seem good, because it will result in different hashes for `a, b` and `b, a`, or am I missing something here.

Space_C0wb0y 2010-07-08 19:51:20

@cowboy: you could fill both cache[a][b] and cache[b][a] if it's symmetrical

eruciform 2010-07-08 20:27:34

Answer 2

A:

I'd just store it in the cache twice, once for each ordering.

def foo(self, a, b):
    try:
        return self._cache[(a, b)]
    except KeyError:
        value = self._calculate(a, b)
        self._cache[(a, b)] = self._cache[(b, a)] = value
        return value

Chris B. 2010-07-08 19:52:29

Answer 3

A:

You could use beaker.cache for that (http://beaker.groovie.org/index.html)

# define a cache manager
cache = CacheManager(dict_of_config_options)

# in your class, apply a cache decorator
@cache.cache('mycache')
def foo(self, a,b):
    return self._calculate

I think it works like you want by default. Not sure if this uses self as part of the key. It assumes that a, b are pickleable.

DiggyF 2010-07-08 20:12:10

Thanks, I'm all for using a library instead of reinventing the wheel, but in this case I believe it to be overkill.

Space_C0wb0y 2010-07-08 20:18:34

Answer 4

+1 A:

There's nothing particularly inefficient or complicated about how you implemented your caching; that's essentially what needs to happen. It isn't very general, however.

You can implement some sort of more generalized caching strategy, using decorators if you like, for convenience. One possible approach might be:

class Memoizer(object):
    def __init__(self):
        self._cache = dict()

    def memoize_unordered(self, f):
        def wrapper(s, *args, **kwargs):
            key = (s, f, frozenset(args), frozenset(kwargs.iteritems()))
            if key not in self._cache:
                print 'calculating', args, kwargs
                self._cache[key] = f(s, *args, **kwargs)
            return self._cache[key]
        return wrapper

    def memoize_ordered(self, f):
        def wrapper(s, *args, **kwargs):
            key = (s, f, tuple(args), frozenset(kwargs.iteritems()))
            if key not in self._cache:
                print 'calculating', args, kwargs
                self._cache[key] = f(s, *args, **kwargs)
            return self._cache[key]
        return wrapper

memoizer = Memoizer()

class Foo(object):

    @memoizer.memoize_unordered
    def foo(self, a, b):
        return self._calculate(a, b)

    def _calculate(self, a, b):
        return frozenset([a,b])

foo = Foo()


results = [foo.foo(*a) for a in [(1, 5), (1, 5), (5, 1), (9, 12), (12, 9)]]
for result in results:
    print 'RESULT', result

printing:

calculating (1, 5) {}
calculating (9, 12) {}
RESULT frozenset([1, 5])
RESULT frozenset([1, 5])
RESULT frozenset([1, 5])
RESULT frozenset([9, 12])
RESULT frozenset([9, 12])

The downside of course, to implementing caching outside of your object, is that the cache data doesn't get deleted when your object goes away, unless you take some care to make this happen.

Matt Anderson 2010-07-08 20:14:48

+1 for decorator. :)

iamgopal 2010-07-09 07:49:50

weakref.WeakKeyDictionary and weakref.WeakValueDictionary are there to address the "object lives forever because of reference in memoizing cache" issue.

Paul McGuire 2010-07-09 09:20:16

Answer 5

A:

Your code has two problems:

(1) It uses the old dict.has_key() method which is slow and has vanished in Python 3.x. Instead use "key in dict" or "key not in dict".

So:

def foo(self, a, b):
    params = frozenset([a, b])
    if params in self._cache:
        self._cache[params] = self._calculate(a, b)
    return self._cache[params]

(2) "key in dict" is more readable, and exposes the much worse problem: your code doesn't work! If the args are in the dict, it recalculates. If they're not in the dict, it will blow up with a KeyError. Consider copy/paste instead of typing from memory.

So:

def foo(self, a, b):
    params = frozenset([a, b])
    if params not in self._cache:
        self._cache[params] = self._calculate(a, b)
    return self._cache[params]

(3) Some more efficiency suggestions:

def foo(self, a, b):
    if a < b:
        params = (a, b)
    else:
        params = (b, a)
    try:
        return self._cache[params]
    except KeyError:
        v = self._cache[params] = self._calculate(a, b)
        return v

John Machin 2010-07-08 21:02:40

The try except block will only be more efficient if he expects to hit the cache more than 99% of the time. http://stackoverflow.com/questions/3111195/python-performance-try-except-or-not-in

Wilduck 2010-07-08 21:29:31

@Wilduck: That benchmark didn't take account of the time taken for the _calculate method in a caching environment, which is likely to make the "not in/try/get" question rather marginal. Benchmarks of the actual application are indicated.

John Machin 2010-07-08 22:18:25

@John: You are right. All my benchmarks have indicated that any kind of tweaking suggested in any of the answers here does not have any measurable performance impact (although the basic caching I implemented gives a x3 performance increase on a small data-set).

Space_C0wb0y 2010-07-09 08:12:53

@John Machin: True. I guess it comes down to preference/readability at that point.

Wilduck 2010-07-09 14:21:25

ansaurus

tags:

views:

answers:

Caching the results of a function with two parameters in Python

related questions