views:

102

answers:

2

I know that Python dicts will "leak" when items are removed (because the item's slot will be overwritten with the magic "removed" value)… But will the set class behave the same way? Is it safe to keep a set around, adding and removing stuff from it over time?

Edit: Alright, I've tried it out, and here's what I found:

>>> import gc
>>> gc.collect()
0
>>> nums = range(1000000)
>>> gc.collect()
0
### rsize: 20 megs
### A baseline measurement
>>> s = set(nums)
>>> gc.collect()
0
### rsize: 36 megs
>>> for n in nums: s.remove(n)
>>> gc.collect()
0
### rsize: 36 megs
### Memory usage doesn't drop after removing every item from the set…
>>> s = None
>>> gc.collect()
0
### rsize: 20 megs
### … but nulling the reference to the set *does* free the memory.
>>> s = set(nums)
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n)
>>> gc.collect()
0
### rsize: 36 megs
### Removing then re-adding keys uses a constant amount of memory…
>>> for n in nums: s.remove(n)
>>> for n in nums: s.add(n+1000000)
>>> gc.collect()
0
### rsize: 47 megs
### … but adding new keys uses more memory.
+5  A: 

Yes, set is basically a hash table just like dict -- the differences at the interface don't imply many differences "below" it. Once in a while, you should copy the set -- myset = set(myset) -- just like you should for a dict on which many additions and removals are regularly made over time.

Alex Martelli
So does dict leak?
Anurag Uniyal
@Anurag, dict can consume growing amount of memory and slow things down if you keep adding and removing keys in a long-running program.
Alex Martelli
Calling this behavior "leak" is over-simplifying and misleading, but the point is that making a fresh copy once in a while for a high-churn dict or set can save memory *and* speed things up!
Alex Martelli
+1  A: 

For questions like these it is often best to run a quick experiment like this one and see what happens:

s = set()
for a in range(1000):
  for b in range(10000000):
    s.add(b)
  for b in range(10000000):
    s.remove(b)

What docs and people say and what behaviour actually is are often at odds. If this is important for you, test it. Don't rely on others.

JUST MY correct OPINION
For the record I'm seeing no evidence of a leak in that code. Memory use jumped up once and then stayed flat.
JUST MY correct OPINION
normally I'd agree… But when the person doing the telling is Alex Martelli, I'm going to just go ahead and believe him.
David Wolever
@ttmrichter yes, that's the problem - at the end (after all the stuff has been removed), it should drop down again. And being a GC'd language, it's harder to know when all the memory that *could* be reclaimed *has* been reclaimed.
David Wolever
@David Wolever - the test needs to be altered according to what you're testing for. The point was that you need to test, not that this particular test was the one to run. ;)
JUST MY correct OPINION
Alright, well, I've run some tests - see my edits.
David Wolever