views:

142

answers:

3

I'm trying to understand the internals of the CPython garbage collector, specifically when the destructor is called. So far, the behavior is intuitive, but the following case trips me up:

  1. Disable the GC.
  2. Create an object, then remove a reference to it.
  3. The object is destroyed and the __del__ method is called.

I thought this would only happen if the garbage collector was enabled. Can someone explain why this happens? Is there a way to defer calling the destructor?

import gc
import unittest

_destroyed = False

class MyClass(object):

    def __del__(self):
        global _destroyed
        _destroyed = True

class GarbageCollectionTest(unittest.TestCase):

    def testExplicitGarbageCollection(self):
        gc.disable()
        ref = MyClass()
        ref = None
        # The next test fails. 
        # The object is automatically destroyed even with the collector turned off.
        self.assertFalse(_destroyed) 
        gc.collect()
        self.assertTrue(_destroyed)

if __name__=='__main__':
    unittest.main()

Disclaimer: this code is not meant for production -- I've already noted that this is very implementation-specific and does not work on Jython.

+9  A: 

Python has both reference counting garbage collection and cyclic garbage collection, and it's the latter that the gc module controls. Reference counting can't be disabled, and hence still happens when the cyclic garbage collector is switched off.

Since there are no references left to your object after ref = None, its __del__ method is called as a result of its reference count going to zero.

There's a clue in the documentation: "Since the collector supplements the reference counting already used in Python..." (my emphasis).

You can stop the first assertion from firing by making the object refer to itself, so that its reference count doesn't go to zero, for instance by giving it this constructor:

def __init__(self):
    self.myself = self

But if you do that, the second assertion will fire. That's because garbage cycles with __del__ methods don't get collected - see the documentation for gc.garbage.

RichieHindle
+3  A: 

Depending on your definition of garbage collector, CPython has two garbage collectors, the reference counting one, and the other one.
The reference counter is always working, and cannot be turned off, as it's quite a fast and lightweight one that does not sigificantly affect the run time of the system.
The other one (some varient of mark and sweep, I think), gets run every so often, and can be disabled. This is because it requires the interpreter to be paused while it is running, and this can happen at the wrong moment, and consume quite a lot of CPU time.
This ability to disable it is there for those time when you expect to be doing something that's time critical, and the lack of this GC won't cause you any problems.

Simon Callan
Is this "two garbage collectors" implementation documented somewhere?
Frederik
Have a look at Alex Martelli's answer, and its associated links. It's probably better than anything else I could come up with.
Simon Callan
+2  A: 

The docs here explain how what's called "the optional garbage collector" is actually a collector of cyclic garbage (the kind that reference counting wouldn't catch). Reference counting is explained here, with a nod to its interplay with the cyclic gc:

While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself.

Alex Martelli