tags:

views:

71

answers:

5

I have a method with two parameters that does some complex computation. It is called very often with the same parameters, so I am using a dictionary for caching. Currently this looks something like this:

def foo(self, a, b):
    params = frozenset([a, b])
    if not params in self._cache:
        self._cache[params] = self._calculate(a, b)
    return self._cache[params]

The reason for building the frozenset is that the parameters can be in any order, but the result will be the same. I am wondering if there is a simpler (and most importantly more efficient) solution for this.

A: 

You can either combine the two values into one hash value like str(a)+'\0'+str(b) and then put it into a cache, or you can make a two-dimensional array so that cache[a][b] returns the value you want.

You may also want to turn this functionality into a @decorator kind of function, then you can reuse the decorator on several functions and maintain one code location for the caching.

eruciform
I like the idea with the decorator. But the one with the combined hash does not seem good, because it will result in different hashes for `a, b` and `b, a`, or am I missing something here.
Space_C0wb0y
@cowboy: you could fill both cache[a][b] and cache[b][a] if it's symmetrical
eruciform
A: 

I'd just store it in the cache twice, once for each ordering.

def foo(self, a, b):
    try:
        return self._cache[(a, b)]
    except KeyError:
        value = self._calculate(a, b)
        self._cache[(a, b)] = self._cache[(b, a)] = value
        return value
Chris B.
A: 

You could use beaker.cache for that (http://beaker.groovie.org/index.html)

# define a cache manager
cache = CacheManager(dict_of_config_options)

# in your class, apply a cache decorator
@cache.cache('mycache')
def foo(self, a,b):
    return self._calculate

I think it works like you want by default. Not sure if this uses self as part of the key. It assumes that a, b are pickleable.

DiggyF
Thanks, I'm all for using a library instead of reinventing the wheel, but in this case I believe it to be overkill.
Space_C0wb0y
+1  A: 

There's nothing particularly inefficient or complicated about how you implemented your caching; that's essentially what needs to happen. It isn't very general, however.

You can implement some sort of more generalized caching strategy, using decorators if you like, for convenience. One possible approach might be:

class Memoizer(object):
    def __init__(self):
        self._cache = dict()

    def memoize_unordered(self, f):
        def wrapper(s, *args, **kwargs):
            key = (s, f, frozenset(args), frozenset(kwargs.iteritems()))
            if key not in self._cache:
                print 'calculating', args, kwargs
                self._cache[key] = f(s, *args, **kwargs)
            return self._cache[key]
        return wrapper

    def memoize_ordered(self, f):
        def wrapper(s, *args, **kwargs):
            key = (s, f, tuple(args), frozenset(kwargs.iteritems()))
            if key not in self._cache:
                print 'calculating', args, kwargs
                self._cache[key] = f(s, *args, **kwargs)
            return self._cache[key]
        return wrapper

memoizer = Memoizer()

class Foo(object):

    @memoizer.memoize_unordered
    def foo(self, a, b):
        return self._calculate(a, b)

    def _calculate(self, a, b):
        return frozenset([a,b])

foo = Foo()


results = [foo.foo(*a) for a in [(1, 5), (1, 5), (5, 1), (9, 12), (12, 9)]]
for result in results:
    print 'RESULT', result

printing:

calculating (1, 5) {}
calculating (9, 12) {}
RESULT frozenset([1, 5])
RESULT frozenset([1, 5])
RESULT frozenset([1, 5])
RESULT frozenset([9, 12])
RESULT frozenset([9, 12])

The downside of course, to implementing caching outside of your object, is that the cache data doesn't get deleted when your object goes away, unless you take some care to make this happen.

Matt Anderson
+1 for decorator. :)
iamgopal
weakref.WeakKeyDictionary and weakref.WeakValueDictionary are there to address the "object lives forever because of reference in memoizing cache" issue.
Paul McGuire
A: 

Your code has two problems:

(1) It uses the old dict.has_key() method which is slow and has vanished in Python 3.x. Instead use "key in dict" or "key not in dict".

So:

def foo(self, a, b):
    params = frozenset([a, b])
    if params in self._cache:
        self._cache[params] = self._calculate(a, b)
    return self._cache[params]

(2) "key in dict" is more readable, and exposes the much worse problem: your code doesn't work! If the args are in the dict, it recalculates. If they're not in the dict, it will blow up with a KeyError. Consider copy/paste instead of typing from memory.

So:

def foo(self, a, b):
    params = frozenset([a, b])
    if params not in self._cache:
        self._cache[params] = self._calculate(a, b)
    return self._cache[params]

(3) Some more efficiency suggestions:

def foo(self, a, b):
    if a < b:
        params = (a, b)
    else:
        params = (b, a)
    try:
        return self._cache[params]
    except KeyError:
        v = self._cache[params] = self._calculate(a, b)
        return v
John Machin
The try except block will only be more efficient if he expects to hit the cache more than 99% of the time. http://stackoverflow.com/questions/3111195/python-performance-try-except-or-not-in
Wilduck
@Wilduck: That benchmark didn't take account of the time taken for the _calculate method in a caching environment, which is likely to make the "not in/try/get" question rather marginal. Benchmarks of the actual application are indicated.
John Machin
@John: You are right. All my benchmarks have indicated that any kind of tweaking suggested in any of the answers here does not have any measurable performance impact (although the basic caching I implemented gives a x3 performance increase on a small data-set).
Space_C0wb0y
@John Machin: True. I guess it comes down to preference/readability at that point.
Wilduck