ansaurus

Question

Why Java and Python garbage collection methods are different?

Answer 1

+2 A:

I think the article "Java theory and practice: A brief history of garbage collection" from IBM should help explain some of the questions you have.

Espo 2008-08-22 07:40:12

Answer 2

+16 A:

There are drawbacks of using reference counting. One of the most mentioned is circular references: Suppose A references B, B references C and C references B. If A were to drop its reference to B, both B and C will still have a reference count of 1 and won't be deleted with traditional reference counting. CPython (reference counting is not part of python itself, but part of the C implementation thereof) catches circular references with a separate garbage collection routine that it runs periodically...

Another drawback: Reference counting can make execution slower. Each time an object is referenced and dereferenced, the interpreter/VM must check to see if the count has gone down to 0 (and then deallocate if it did). Garbage Collection does not need to do this.

Also, Garbage Collection can be done in a separate thread (though it can be a bit tricky). On machines with lots of RAM and for processes that use memory only slowly, you might not want to be doing GC at all! Reference counting would be a bit drawback there in terms of performance...

Daren Thomas 2008-08-22 09:10:06

An additional difference worth noting is that eager GC via reference counting always uses "minimal" memory (except in the circular dependency case), whereas Java's lazy approach may cause the JVM to temporarily use far more memory than actually needed, until a GC run brings it back into line. Java's approach gives speed at the cost of memory, and has the advantage when memory is plentiful. When it is scarce, Python's approach will work better.

Lars Yencken 2010-03-15 05:12:56

I've been working on solving reference counting problem with so called "circular references," and I believe I've solved it. Work is still in progress, and there is much more space to improve it. http://goo.gl/n3T6

mtasic 2010-10-31 08:34:59

Answer 3

+5 A:

Darren Thomas gives a good answer. However, one big difference between the Java and Python approaches is that with reference counting in the common case (no circular references) objects are cleaned up immediately rather than at some indeterminate later date.

For example, I can write sloppy, non-portable code in CPython such as

def parse_some_attrs(fname):
    return open(fname).read().split("~~~")[2:4]

and the file descriptor for that file I opened will be cleaned up immediately because as soon as the reference to the open file goes away, the file is garbage collected and the file descriptor is freed. Of course, if I run Jython or IronPython or possibly PyPy, then the garbage collector won't necessarily run until much later; possibly I'll run out of file descriptors first and my program will crash.

So you SHOULD be writing code that looks like

def parse_some_attrs(fname):
    with open(fname) as f:
        return f.read().split("~~~")[2:4]

but sometimes people like to rely on reference counting to always free up their resources because it can sometimes make your code a little shorter.

I'd say that the best garbage collector is the one with the best performance, which currently seems to be the Java-style generational garbage collectors that can run in a separate thread and has all these crazy optimizations, etc. The differences to how you write your code should be negligible and ideally non-existent.

Eli Courtwright 2008-08-22 12:40:03

Answer 4

+2 A:

The latest Sun Java VM actually have multiple GC algorithms which you can tweak. The Java VM specifications intentionally omitted specifying actual GC behaviour to allow different (and multiple) GC algorithms for different VMs.

For example, for all the people who dislike the "stop-the-world" approach of the default Sun Java VM GC behaviour, there are VM such as IBM's WebSphere Real Time which allows real-time application to run on Java.

Since the Java VM spec is publicly available, there is (theoretically) nothing stopping anyone from implementing a Java VM that uses CPython's GC algorithm.

sundae1888 2008-08-22 22:58:36

It probably disallows simple reference counting(assuming you can't add stuff to reference counting provably not leak under any circumstance). I'm not sure how python deals with this, although I think it has at least some sort of check for cycles.

Roman A. Taycher 2010-09-26 08:54:48

Answer 5

+1 A:

Reference counting is particularly difficult to do efficiently in a multi-threaded environment. I don't know how you'd even start to do it without getting into hardware assisted transactions or similar (currently) unusual atomic instructions.

Reference counting is easy to implement. JVMs have had a lot of money sunk into competing implementations, so it shouldn't be surprising that they implement very good solutions to very difficult problems. However, it's becoming increasingly easy to target your favourite language at the JVM.

Tom Hawtin - tackline 2008-09-05 20:03:19

Answer 6

+4 A:

Garbage collection is faster (more time efficient) than reference counting, if you have enough memory. For example, a copying gc traverses the "live" objects and copies them to a new space, and can reclaim all the "dead" objects in one step by marking a whole memory region. This is very efficient, if you have enough memory. Generational collections use the knowledge that "most objects die young"; often only a few percent of objects have to be copied.

[This is also the reason why gc can be faster than malloc/free]

Reference counting is much more space efficient than garbage collection, since it reclaims memory the very moment it gets unreachable. This is nice when you want to attach finalizers to objects (e.g. to close a file once the File object gets unreachable). A reference counting system can work even when only a few percent of the memory is free. But the management cost of having to increment and decrement counters upon each pointer assignment cost a lot of time, and some kind of garbage collection is still needed to reclaim cycles.

So the trade-off is clear: if you have to work in a memory-constrained environment, or if you need precise finalizers, use reference counting. If you have enough memory and need the speed, use garbage collection.

mfx 2008-09-16 16:38:56

Answer 7

+10 A:

Actually reference counting and the strategies used by the Sun JVM are all different types of garbage collection algorithms.

There are two broad approaches for tracking down dead objects: tracing and reference counting. In tracing the GC starts from the "roots" - things like stack references, and traces all reachable (live) objects. Anything that can't be reached is considered dead. In reference counting each time a reference is modified the object's involved have their count updated. Any object whose reference count gets set to zero is considered dead.

With basically all GC implementations there are trade offs but tracing is usually good for high through put (i.e. fast) operation but has longer pause times (larger gaps where the UI or program may freeze up). Reference counting can operate in smaller chunks but will be slower overall. It may mean less freezes but poorer performance overall.

Additionally a reference counting GC requires a cycle detector to clean up any objects in a cycle that won't be caught by their reference count alone. Perl 5 didn't have a cycle detector in its GC implementation and could leak memory that was cyclic.

Research has also been done to get the best of both worlds (low pause times, high throughput): http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf

Luke Quinane 2008-10-13 01:42:10

Answer 8

A:

Late in the game, but I think one significant rationale for RC in python is its simplicity. See this email by Alex Martelli, for example.

(I could not find a link outside google cache, the email date from 13th october 2005 on python list).

David Cournapeau 2009-10-22 01:11:46

I think this is the wrong link.

mtasic 2010-10-31 08:46:21

ansaurus

tags:

views:

answers:

Why Java and Python garbage collection methods are different?

related questions