views:

3341

answers:

12

Garbage collection has been around since the early days of LISP, and now - several decades on - most modern programming languages utilize it.

Assuming that you're using one of these languages, what reasons would you have to not use garbage collection, and instead manually manage the memory allocations in some way?

Have you ever had to do this?

Please give solid examples if possible.

+5  A: 

Memory allocations? No, I think the GC is better at it than I am.

But scarce resource allocations, like file handles, database connections, etc.? I write the code to close those when I'm done. GC won't do that for you.

duffymo
Some GC support finalizers which *will* do those things.
geocar
Yes, but finalisers always run some time after the object has become orphaned which means handles and sockets will be open for longer than expected. Yikes!
rpetrich
Never rely on finalizers for resources you care about.
Barry Kelly
I don't use finalizers.
duffymo
In most systems it's very hard for the system to guarantee *when* the finalizer will run or even *if* it will run...
Norman Ramsey
+16  A: 

I can think of a few:

Deterministic deallocation/cleanup

Real time systems

Not giving up half the memory or processor time - depending on the algorithm

Faster memory alloc/dealloc and application-specific allocation, deallocation and management of memory. Basically writing your own memory stuff - typically for performance sensitive apps. This can be done where the behavior of the application is fairly well understood. For general purpose GC (like for Java and C#) this is not possible.

EDIT

That said, GC has certainly been good for much of the community. It allows us to focus more on the problem domain rather than nifty programming tricks or patterns. I'm still an "unmanaged" C++ developer though. Good practices and tools help in that case.

Tim
For deterministic scheduling in real-time applications, you typically don't do any memory management. We pre-allocated all structures. We degraded processing to stay within the memory limits.
S.Lott
Yes, sorry if I was unclear and mixed that with the other comments.
Tim
Keep in my mind that with ephemeral/generational collectors, allocation from a GC *can* be much faster then a plain ol' malloc.
runT1ME
I don't doubt it.
Tim
Don't forget C# supports both GC and unmanaged memory. If you really need memory outside the GC, it can do it.
Jonathan Allen
+4  A: 

Real time applications are probably difficult to write with a garbage collector. Maybe with an incremental GC that works in another thread, but this is an additional overhead.

martinus
+2  A: 

One case I can think of is when you are dealing with large data sets amounting to hundreads of megabytes or more. Depending on the situation you might want to free this memory as soon as you are done with it, so that other applications can use it.

Also, when dealing with some unmanaged code there might be a situation where you might want to prevent the GC from collecting some data because it's still being used by the unmanaged part. Though I still have to think of a good reason why simply keeping a reference to it might not be good enough. :P

Vilx-
+3  A: 

One situation I've dealt with is image processing. While working on an algorithm for cropping images, I've found that managed libraries just aren't fast enough to cut it on large images or on multiple images at a time.

The only way to do processing on an image at a reasonable speed was to use non-managed code in my situation. This was while working on a small personal side-project in C# .NET where I didn't want to learn a third-party library because of the size of the project and because I wanted to learn it to better myself. There may have been an existing third-party library (perhaps Paint.NET) that could do it, but it still would require unmanaged code.

Dan Herbert
A: 

In theory, nothing. In practice though, don't use it if it can't perform for your app.

Different GC algorithms may or may not be efficient for differnt types of applications. Some GCs are better for long running apps, some are tuned for throughput, some are tuned for reducing latency, and some just suck in general.

I've had a few instances where java's GC was less then efficient, and I wished I could manage my own memory. Basically I was using a TON of memory that became garbage right away, and because of the way the GC worked, some of it was ending up in 'tenured' generation when it didn't need to be, and I can't force java to use copy-collection for all its memory.

Having 16 gigs of ram instead of 8 probably would have fixed the problem too. All in all, I just had to do some extra tuning to get it working, and since I can't turn 'off' gc in java, it was my only option.

I suspect Java 7's new GC would have fixed my problem.

runT1ME
A: 

I don't quite understand the question. Since you ask about a language that uses GC, I assume you are asking for examples like

  1. Deliberately hang on to a reference even when I know it's dead, maybe to reuse the object to satisfy a future allocation request.
  2. Keep track of some objects and close them explicitly, because they hold resources that can't easily be managed with the garbage collector (open file descriptors, windows on the screen, that sort of thing).

I've never found a reason to do #1, but #2 is one that comes along occasionally. Many garbage collectors offer mechanisms for finalization, which is an action that you bind to an object and the system runs that action before the object is reclaimed. But oftentimes the system provides no guarantees about whether or if finalizers actually run, so finalization can be of limited utility.

The main thing I do in a garbage-collected language is to keep a tight watch on the number of allocations per unit of other work I do. Allocation is usually the performance bottleneck, especially in Java or .NET systems. It is less of an issue in languages like ML, Haskell, or LISP, which are typically designed with the idea that the program is going to allocate like crazy.


EDIT: longer response to comment.

Not everyone understands that when it comes to performance, the allocator and the GC must be considered as a team. In a state-of-the-art system, allocation is done from contiguous free space (the 'nursery') and is as quick as test and increment. But unless the object allocated is incredibly short-lived, the object incurs a debt down the line: it has to be copied out of the nursery, and if it lives a while, it may be copied through several generatations. The best systems use contiguous free space for allocation and at some point switch from copying to mark/sweep or mark/scan/compact for older objects. So if you're very picky, you can get away with ignoring allocations if

  • You know you are dealing with a state-of-the art system that allocates from continuous free space (a nursery).
  • The objects you allocate are very short-lived (less than one allocation cycle in the nursery).

Otherwise, allocated objects may be cheap initially, but they represent work that has to be done later. Even if the cost of the allocation itself is a test and increment, reducing allocations is still the best way to improve performance. I have tuned dozens of ML programs using state-of-the-art allocators and collectors and this is still true; even with the very best technology, memory management is a common performance bottleneck.

And you'd be surprised how many allocators don't deal well even with very short-lived objects. I just got a big speedup from Lua 5.1.4 (probably the fastest of the scripting language, with a generational GC) by replacing a sequence of 30 substitutions, each of which allocated a fresh copy of a large expression, with a simultaneous substitution of 30 names, which allocated one copy of the large expression instead of 30. Performance problem disappeared.

Norman Ramsey
This is bad info. Allocation is not a bottleneck in GC; allocation is incredibly cheap (a compare and pointer increment). With generational GCs, what you want to watch out for is mid-life crisis: memory that is allocated, kept around for a while, then dumped. Don't do that.
Barry Kelly
State-of-the-art allocation is indeed incredibly cheap. But not all systems allocate from contiguous free space. Incredible though it may seem, some systems still use free lists and mark-and-sweep collectors. As systems mature, algorithms become more sophisticated and performance improves.
Norman Ramsey
+1  A: 

There are two major types of real time systems, hard and soft. The main distinction is that hard real time systems require that an algorithm always finish in a particular time budget where as a soft system would like it to normally happen. Soft systems can potentially use well designed garbage collectors although a normal one would not be acceptable. However if a hard real time system algorithm did not complete in time then lives could be in danger. You will find such sorts of systems in nuclear reactors, aeroplanes and space shuttles and even then only in the specialist software that the operating systems and drivers are made of. Suffice to say this is not your common programming job.

People who write these systems don't tend to use general purpose programming languages. Ada was designed for the purpose of writing these sorts of real time systems. Despite being a special language for such systems in some systems the language is cut down further to a subset known as Spark. Spark is a special safety critical subset of the Ada language and one of the features it does not allow is the creation of a new object. The new keyword for objects is totally banned for its potential to run out of memory and its variable execution time. Indeed all memory access in Spark is done with absolute memory locations or stack variables and no new allocations on the heap is made. A garbage collector is not only totally useless but harmful to the guaranteed execution time.

These sorts of systems are not exactly common, but where they exist some very special programming techniques are required and guaranteed execution times are critical.

+4  A: 

I do a lot of embedded development, where the question is more likely to be whether to use malloc or static allocation and garbage collection is not an option.

I also write a lot of PC-based support tools and will happily use GC where it is available & fast enough and it means that I don't have to use pedant::std::string.

I write a lot of compression & encryption code and GC performance is usually not good enough unless I really bend the implementation. GC also requires you to be very careful with address aliasing tricks. I normally write performance sensitive code in C and call it from Python / C# front ends.

So my answer is that there are reasons to avoid GC, but the reason is almost always performance and it's then best to code the stuff that needs it in another language rather than trying to trick the GC.

If I develop something in MSVC++, I never use garbage collection. Partly because it is non-standard, but also because I've grown up without GC in C++ and automatically design in safe memory reclamation. Having said this, I think that C++ is an abomination which fails to offer the translation transparency and predictability of C or the scoped memory safety (amongst other things) of later OO languages.

+2  A: 

Two words: Space Hardening

I know its an extreme case, but still applicable. One of the coding standards that applied to the core of the Mars rovers actually forbid dynamic memory allocation. While this is indeed extreme, it illustrates a "deploy and forget about it with no worries" ideal.

In short, have some sense as to what your code is actually doing to someone's computer. If you do, and you are conservative .. then let the memory fairy take care of the rest. While you develop on a quad core, your user might be on something much older, with much less memory to spare.

Use garbage collection as a safety net, be aware of what you allocate.

Tim Post
A: 

Just about all of these answers come down to performance and control. One angle I haven't seen in earlier posts is that skipping GC gives your application more predictable cache behavior in two ways.

  1. In certain cache sensitive applications, having the language automatically trash your cache every once in a while (although this depends on the implementation) can be a problem.
  2. Although GC is orthogonal to allocation, most implementations give you less control over the specifics. A lot of high performance code has data structures tuned for caches, and implementing stuff like cache-oblivious algorithms requires more fine grained control over memory layout. Although conceptually there's no reason GC would be incompatible with manually specifying memory layout, I can't think of a popular implementation that lets you do so.
Kobold
A: 

In video games, you don't want to run the garbage collector in between a game frame.

For example, the Big Bad is in front of you and you are down to 10 life. You decided to run towards the Quad Damage powerup. As soon as you pick up the powerup, you prepare yourself to turn towards your enemy to fire with your strongest weapon.

When the powerup disappeared, it would be a bad idea to run the garbage collector just because the game world has to delete the data for the powerup.

Video games usually manages their objects by figuring out what is needed in a certain map (this is why it takes a while to load maps with a lot of objects). Some game engines would call the garbage collector after certain events (after saving, when the engine detects there's no threat in the vicinity, etc).

Other than video games, I don't find any good reasons to turn off garbage collecting.

Edit: After reading the other comments, I realized that embedded systems and Space Hardening (Bill's and tinkertim's comments, respectively) are also good reasons to turn off the garbage collector

MrValdez