views:

81

answers:

2

Hi all!

I'm fighting a memory leak in a Python project and spent much time on it already. I have deduced the problem to a small example. Now seems like I know the solution, but I can't understand why.

import random

def main():
    d = {}
    used_keys = []
    n = 0
    while True:
        # choose a key unique enough among used previously
        key = random.randint(0, 2 ** 60)
        d[key] = 1234 # the value doesn't matter
        used_keys.append(key)
        n += 1
        if n % 1000 == 0:
            # clean up every 1000 iterations
            print 'thousand'
            for key in used_keys:
                del d[key]
                used_keys[:] = []
                #used_keys = []

if __name__ == '__main__':
    main()

The idea is that I store some values in the dict d and memorize used keys in a list to be able to clean the dict from time to time.

This variation of the program confidently eats memory never returning it back. If I use alternative method to „clear” used_keys that is commented in the example, all is fine: memory consumption stays at constant level.

Why?

Tested on CPython and many linuxes.

+4  A: 

Here's the reason - the current method does not delete the keys from the dict (only one, actually). This is because you clear the used_keys list during the loop, and the loop exits prematurely.

The 2nd (commented) method, however, does work as you assign a new value to used_keys so the loop finishes successfully.

See the difference between:

>>> a=[1,2,3]
>>> for x in a:
...    print x
...    a=[]
...
1
2
3

and

>>> a=[1,2,3]
>>> for x in a:
...    print x
...    a[:] = []
...
1
>>>
adamk
Ah!! I'm stupid, stupid, stupid. I was so happy to reconstruct the memory leak in a small snippet… It is a sad mistake, of course. It doesn't represent my problem, I gonna continue hunting. But you're right with the answer on the original question. Thanks!
nailxx
A: 

Why wouldn't something like this work?

from itertools import count
import uuid

def main():
    d = {}
    for n in count(1):
        # choose a key unique enough among used previously
        key = uuid.uuid1()
        d[key] = 1234 # the value doesn't matter
        if n % 1000 == 0:
            # clean up every 1000 iterations
            print 'thousand'
            d.clear()

if __name__ == '__main__':
    main()
gnibbler