views:

208

answers:

5

I have a python function that has a deterministic result. It takes a long time to run and generates a large output:

def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result

I modify time_consuming_function from time to time, but I would like to avoid having it run again while it's unchanged. [time_consuming_function only depends on functions that are immutable for the purposes considered here; i.e. it might have functions from Python libraries but not from other pieces of my code that I'd change.] The solution that suggests itself to me is to cache the output and also cache some "hash" of the function. If the hash changes, the function will have been modified, and we have to re-generate the output.

Is this possible or ridiculous?


Updated: based on the answers, it looks like what I want to do is to "memoize" time_consuming_function, except instead of (or in addition to) arguments passed into an invariant function, I want to account for a function that itself will change.

+1  A: 

Rather than putting the function in a string, I would put the function in its own file. Call it time_consuming.py, for example. It would look something like this:

def time_consuming_method():
   # your existing method here

# Is the cached data older than this file?
if (not os.path.exists(data_file_name) 
    or os.stat(data_file_name).st_mtime < os.stat(__file__).st_mtime):
    data = time_consuming_method()
    save_data(data_file_name, data)
else:
    data = load_data(data_file_name)

# redefine method
def time_consuming_method():
    return data

While testing the infrastructure for this to work, I'd comment out the slow parts. Make a simple function that just returns 0, get all of the save/load stuff working to your satisfaction, then put the slow bits back in.

Daniel Stutzbach
A: 

What you describe is effectively memoization. Most common functions can be memoized by defining a decorator.

A (overly simplified) example:

def memoized(f):
    cache={}
    def memo(*args):
        if args in cache:
            return cache[args]
        else:
            ret=f(*args)
            cache[args]=ret
            return ret
    return memo

@memoized
def time_consuming_method():
    # lots_of_computing_time to come up with the_result
    return the_result

Edit:

From Mike Graham's comment and the OP's update, it is now clear that values need to be cached over different runs of the program. This can be done by using some of of persistent storage for the cache (e.g. something as simple as using Pickle or a simple text file, or maybe using a full blown database, or anything in between). The choice of which method to use depends on what the OP needs. Several other answers already give some solutions to this, so I'm not going to repeat that here.

MAK
It appears that OP wants to memoize a function's one return value between runtimes. This caches a function's various return values during one run based on various arguments.
Mike Graham
@Mike Graham: Thanks, I updated my answer.
MAK
A: 

So, here is a really neat trick using decorators:

def memoize(f):
    cache={};
    def result(*args):
        if args not in cache:
           cache[args]=f(*args);
        return cache[args];
    return result;

With the above, you can then use:

@memoize
def myfunc(x,y,z):
   # Some really long running computation

When you invoke myfunc, you will actually be invoking the memoized version of it. Pretty neat, huh? Whenever you want to redefine your function, simply use "@memoize" again, or explicitly write:

myfunc = memoize(new_definition_for_myfunc);

Edit
I didn't realize that you wanted to cache between multiple runs. In that case, you can do the following:

import os;
import os.path;
import cPickle;

class MemoizedFunction(object):
     def __init__(self,f):
         self.function=f;
         self.filename=str(hash(f))+".cache";
         self.cache={};
         if os.path.exists(self.filename):
             with open(filename,'rb') as file:
                 self.cache=cPickle.load(file);

     def __call__(self,*args):
         if args not in self.cache:
             self.cache[args]=self.function(*args);
         return self.cache[args];

     def __del__(self):
         with open(self.filename,'wb') as file:
              cPickle.dump(self.cache,file,cPickle.HIGHEST_PROTOCOL);

def memoize(f):
    return MemoizedFunction(f);
Michael Aaron Safyan
Pretty neat, yes, but doesn't answer the question. The point was that the code in the method changed, not the input parameters to it.
Lasse V. Karlsen
It appears that OP wants to memoize a function's one return value between runtimes. This caches a function's various return values during one run based on various arguments.
Mike Graham
@Mike, ok. I didn't realize this was between runs of the program.
Michael Aaron Safyan
@Mike, ok, I've updated to cache between runs.
Michael Aaron Safyan
@Michael, Your current revision relies on `hash(f)` to be the same between runs, which is neither guaranteed nor expected.
Mike Graham
+5  A: 

If I understand your problem, I think I'd tackle it like this. It's a touch evil, but I think it's more reliable and on-point than the other solutions I see here.

import inspect
import functools
import json

def memoize_zeroadic_function_to_disk(memo_filename):
    def decorator(f):
        try:
            with open(memo_filename, 'r') as fp:
                cache = json.load(fp)
        except IOError:
            # file doesn't exist yet
            cache = {}

        source = inspect.getsource(f)

        @functools.wraps(f)
        def wrapper():
            if source not in cache:
                cache[source] = f()
                with open(memo_filename, 'w') as fp:
                    json.dump(cache, fp)

            return cache[source]
        return wrapper
    return decorator

@memoize_zeroadic_function_to_disk(...SOME PATH HERE...)
def time_consuming_function():
    # lots_of_computing_time to come up with the_result
    return the_result
Mike Graham
So the only hashing that goes on is Python's internal dictionary key hashing, where the key is the string value of a function's entire uncompiled code. Is there a way to get the compiled code of the function, so changing the line spacing or comments won't result in a different value?
Seth Johnson
@Seth, Yes, it makes sense to me to use Python's internal hashing here since what you really want is to compare the value (lest you have hash collisions and not know it. This is unlikely but very possible.) If you only wanted to cache the most recent function, I wouldn't use a dict or hashing at all, but just compare the value. Only since you said (out-of-band) that you wanted to store multiple versions of the function so you could return to old code did I use the dict.
Mike Graham
@Seth, It should be possible to store less information than the whole source code—whitespace and comments and such intact like this—but it will take some care to be positive you have a necessary and sufficient condition for a match. `f.func_code.co_code` is the bytecode that the function actually stores, but I'm not sure I can promise you it will be the same between compiles or not. I am also not completely sure it can't give you false positives.
Mike Graham
A: 

The first part is memoization and serialization of your lookup table. That should be straightforward enough based on some python serialization library. The second part is that you want to delete your serialized lookup table when the source code changes. Perhaps this is being overthought into some fancy solution. Presumably when you change the code you check it in somewhere? Why not add a hook to your checkin routine that deletes your serialized table? Or if this is not research data and is in production, make it part of your release process that if the revision number of your file (put this function in it's own file) has changed, your release script deletes the serialzed lookup table.

frankc