views:

1079

answers:

1

I have a program that process several files, and for each file a report is generated. The report generating part is a separate function that takes a filename, then returns. During report generation, intermediate parts are cached in memory, as they may be used for several parts of the report, to avoid recalculating.

When I run this program on all files in a directory, it will run for a while, then crash with a MemoryError. If I then rerun it on the same directory, it will skip all files that it successfully created a report for, and continue on. It will process a couple of files before crashing again.

Now, why isn't all resources cleared, or marked at least for garbage collection, after the method call that generates the report? There are no instances leaving, and I am not using any global objects, and after each file processing, all open files are closed.

Are there ways for me to verify that there is no extra references to an object? Is there a way to force garbage collection in Python?

A bit more detail about the implementation and cache. Each report has several elements in it, each element can then rely on different computations, each computation can depend on other computations. If one computation is already done, I don't want to do it again (most of these are expensive).

Here is an abbreviated version off the cache:

class MathCache:
    def __init__(self): self.cache = {}
    def get(data_provider):
        if not data_provider.id in self.cache:
            self.cache[data_provider.id] = data_provider.get_value(self)
        return self.cache[data_provider.id]

An instance of it is created, and then passed to each element in the report. This instance is only kept in a local reference in the report creation method.

All data_providers inherit from a common class that serves to make a unique id for the instance based on a hash off constructor arguments and class name. I pass on the MathCache as the data_provider itself may rely on other calculations.

+3  A: 

You should check out the gc module: http://docs.python.org/library/gc.html#module-gc.

bayer