views:

353

answers:

6

Most programmers agree that garbage collection is a great thing, and in most applications is well worth the overhead. However, my personal observation is that memory management for most objects is trivial, and maybe 10%-20% of them account for the need for kludges such as reference counting and really complicated memory management schemes in general. It seems to me like one could get all the benefits of garbage collection with only a small fraction of the overhead by conservatively deleting large objects manually where the object's lifetime is obvious and letting the GC collect the rest, assuming the GC implementation supports such a thing. This would allow the GC to run much less frequently, and consume less excess memory, while still avoiding cases that are actually hard to manage manually. Even more interesting would be if the compiler inserted deterministic delete statements automatically where lifetimes were obvious:

int myFunc() {
    Foo[] foo = new Foo[arbitraryNumber];  // May be too big to stack allocate.
    // do stuff such that the compiler can prove foo doesn't escape.
    // foo is obviously no longer needed, can be automatically deleted here.
    return someInteger;
}

Of course, this might not work well with a copying GC, but for the sake of this post let's assume our GC isn't copying. Why are such hybrid memory management schemes apparently so rare in mainstream programming languages?

+1  A: 

A quick note on the "obviously no longer needed": It's not THAT easy ;)

[...]
Foo() {
    someGlobalList.add(this);
}
[...]

Apart from that, your idea of being able to manually delete big thingies is imo a good one. As i understand it, it is at least partly implemented in modern languages (e.g. using in C#, which sadly does not actually free). However, not to the degree you wish for.

mafutrct
yes, but consider the case of "string s1,s2,s3; ... s1 + s2 + s3". The memory allocated during concatenation of s1 and s2 will be garbage before the end of this expression, guaranteed. -- My understanding is that various Lisps have had these kinds of optimizations.
Aaron
+3  A: 

Because this case is too rare. Almost no method is isolated. They all accept objects from outside or create objects and pass them out.

A method which doesn't access any fields, doesn't has parameters and doesn't return something can't do anything.

Instead, the GCs concentrate on the most common case (the 90%) and try to keep those 90% (the short lived temp objects) in check. This means that in the common case, you have fewer objects to check and the rest doesn't matter that much. Next, you use an incremental sweep (so you can run in little sprints which interrupt only for a short moment).

I once tried to come up with a better GC algorithm and failed miserably. They use an approach that borders on the arcane. The document about Java 5 GC Performance Tuning should give you some ideas. And there is of course the GC article in Wikipedia.

What it boils down to: Using a GC can even be faster than having a traditional memory allocation and freeing schema. Think of the classic algorithm which just locates any reachable object and copies it to a new place. If you have just forgotten about a whole lot of objects (say, 90% of all allocated objects), this algorithm just needs to check the rest (10%). Anything that it can't reach, no matter how much that may be, won't matter. Now you may say that copying is expensive but a) this is not true; today an average desktop CPU can copy 40MB in less than 100ms and b) the copying will protect you against fragmentation so it is actually a good thing.

Aaron Digulla
Ok, dumb mistake. I was basically implying a pure function, but pure functions should return something. Fixed.
dsimcha
Check all the source code you've written in your life. I doubt that you will find many methods that does something useful *and* doesn't work on objects from an outside scope. Also, as I explained, live objects are expensive, dead objects are for free.
Aaron Digulla
In my experience, <GC-language> app developers aren't even aware of half the allocations they make. Perhaps you just weren't aware of these trivial garbage cases because you didn't realize how much garbage you made.
Aaron
And they don't need to be. GC has been a CPU hog in the dark ages but things have improved considerably. With every day passing, we can forget about this more and more and concentrate on our work instead of how the OS works.
Aaron Digulla
+1  A: 

What you describe as deterministic delete statements for non escaping objects modern GC implementations do fairly effectively. Most allocated objects are allocated from pools and deleted very efficiently on method exit - very few end up on the slower to GC heap.

This in effect produces the same effect as you describe with less programmer intervention. And because a garbage collector is automatic allows less scope for human error (what if you deleted something for which a reference was still held).

Nick Fortescue
A: 

If you are creating an object that you expect will be used only within the context of one method call at a time and that has no finalization, then I would recommend using a value-type instead of a reference type in languages that make this distinction. For example, in C# you could declare your entity as a struct rather than a class. Then your short-lifetime, method-local instances will be allocated on the stack rather than the heap, and they will be deallocated implicity the moment the method returns.

And this technique can go even further that your original idea, because the struct can be passed to other methods without concern for breaking the lifetime analysis.

For an array, as in the question, you can use the stackalloc command to achieve this effect.

Jeffrey L Whitledge
I was thinking this smells suspiciously similar to stack vs. heap allocated objects :)
+2  A: 

"Most programmers agree that garbage collection is a great thing, and in most applications is well worth the overhead."

Sweeping generalizations...

I second this statement.
Sybiam
I agree that garbage collection is a great thing, and in most applications is well worth the overhead. ;-)
teedyay
Don't get me wrong, both GC and non-GC have their place IMO.
A: 

This seems to be exactly how D manages memory. Its garbage collected while allowing you to specifically delete objects when you want to, or even eschew the GC all together in favor of malloc / free. The scoped keyword seems to do what you want to on the stack, although I'm not sure if it actually allocates the object on the stack or the heap.

BigSandwich