views:

181

answers:

6

It's more a theoretical question than a practical one. I know that GC are currently working with big process that use 1,2 or 3 Go of memory but I'd like to know if it is theoretically possible to have an efficient GC with really huge memory (1000 Go or more).

I ask this question because, even if the GC could run its algorithm progressively, it needs to wait to have scanned all objects before freeing an object to be sure no other object use it. So, in a very big system, memory should logically be freed less frequently. If memory is very big, unused object would be freed so rarely that the GC wouldn't have any more interest.

Do you know studies or articles on this subject ?

+1  A: 

Go through this basic...about garbage collection

http://msdn.microsoft.com/en-us/library/ms973837.aspx

Sam Rudolph
Could you summarize here for us what the article says in response to this question specifically?
DOK
+3  A: 

There are different algorithms, and none that I know of has to scan all memory. In Java and .NET, for example, the garbage collector starts by assuming that all objects are garbage. It then identifies the roots (objects that are always alive), and from there walks the object graph, marking any reachable object as alive. Once all reachable objects are marked, they are compacted, which effectively increases available memory.

The time needed to perform a garbage collection therefore depends not on the total memory addressed by the process, but on the size of the live objects graph. The number of dead objects is completely irrelevant since they are not even considered.

See http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/ for a good explanation.

edit: as the author changed the question, this reply somewhat lost its relevance. Still, I'll leave it for documentation purposes.

Dr_Asik
True, though I believe that it is actually proportional to the number of _references_ rather than number of _objects_ (i.e. an array of 1,000,000,000 references to the same object would probably strain the GC quite a bit). So in that sense the "size of the graph" is defined as number of edges in it rather than number of nodes.
Pavel Minaev
I have replaced "all memory" by "all objects" in the question. It was what I really wanted to say.
Rexxar
+2  A: 

Even if the application would use more memory, the number of objects would probably not change that much. It would mostly use larger objects, not very much more of them.

So, scanning the active references would not take much longer. It's only the active objects that needs to be scanned, nothing in between them.

With more available memory the GC would of course run less frequently. There is no point in collecting unused objects just to keep the memory usage to a minimum. A computer doesn't run faster from having a lot of unused memory.

Guffa
+2  A: 

I actually expect garbage collection to be an even greater boon on systems with very large memory requirements. My experience to date has been meeting up with this (although I haven't hit thousands of GB usage, just tens).

Typically systems that use large amounts of memory are using large collections of objects - so often you have a similar number of allocations, but the individual allocates are very large. This keeps the GC performance about equal to a system that's using a smaller amount of memory, since the GC perf. is really tied to number of rooted objects, not the total size of objects.

However, when you're doing very large memory allocations, traditional systems tend to run into lots of memory fragmentation problems. Many garbage collectors (although, unfortunately, not .NET's large object heap) will do memory compacting during the collection cycle, which can actually provide huge benefits for large memory usage systems, especially over time.

Reed Copsey
A: 

The larger your heap memory, the more efficient the garbage collector. To avoid long pause times, you need any sensible modern collector, which works incrementally. (A simple such collector is packaged with Lua 5.1.)

For a nice article about this see Garbage collection can be faster than stack allocation by Andrew Appel.

Norman Ramsey
+2  A: 

Many systems also employ generational garbage collection. Generational garbage collection pools objects into buckets based on how long ago they were created. The theory is that objects that were just created are far more likely to be cleaned up than something that has been in memory constantly for days. The garbage collection system can then be tuned based on the memory profile of your application. Since the in-memory objects are bucketed based on their age, really long lived things such as reference tables, can be checked far less frequently and thus will not put the same load on the garbage collector.

Jacob

TheJacobTaylor
Generational GC is a huge help in this. We have a Lisp application that can take up 100 GB of memory, and normal GCs are nice and fast. Periodic global GCs take longer, but there's tuning that can help with that. (There's also the notion of fencing off of old areas; that helps us in our application where we can say "go ahead and run this process, but existing memory in the image isn't garbage, so don't bother to scan it. That's kind of Lisp-specific though.)
khedron