views:

216

answers:

7

Most of the modern languages have built in garbage collection (GC). e.g. Java, .NET languages, Ruby, etc. Indeed GC simplifies application development in many ways.

I am interested to know the limitations / disadvantages of writing applications in GCed languages. Assuming GC implemenation is optimal, I am just wondering we may be limited by GC to take some optimization decisions.

A: 

If you are confident(good) about your memory management skills, there is no advantage.

The concept was introduced to minimize the time of development and due to lack of experts in programming who thoroughly understood memory.

Gollum
"confident" is different from "good" ;)
Sean Edwards
There are MANY advantages to a GC, even if you are very skilled and confident in your memory management. This becomes especially true if you're using a compacting GC...
Reed Copsey
@Sean, thanks :D
Gollum
The answer is a bit smug IMO. It's like saying C was invented for people that do not thoroughly understand assembler. Garbage collection greatly simplifies programming and hence allows more efficient development. Full control over memory is needed in very few places, as are the theoretical performance gains.
inflagranti
*Garbage collection greatly simplifies programming* - isn't that what I wrote. And, I am never against GC, I just stated why it was introduced, you can differ, its just my opinion(how I see things). C definitely made it easier for people to program than assembly.
Gollum
Memory leaks are generally caused by things missed, not from lack of expertise. Although *I* don't write bugs, I hear they happen to other people because of this reason.
jfsk3
@jfsk3, not just memory leaks are the problem. fragmentation too, you need to be careful.
Gollum
@Gollum: I appreciate the point of view. I noticed at stack overflow subjective answers are discouraged, unfortunately. I think stack overflow community is conscious to prevent slashdot style discussions.
ragu.pattabi
-1: Memory safety of *all* managed code (not just your own) is an obvious counter example.
Jon Harrop
@Jon, no offense. but can you tell me why people do not run it on embedded devices? I am just saying that it reduces the time of development by abstracting the memory details, but you pay a good price for that. if your code is free of any memory problems(let's assume) which one would you prefer it to be, managed or unmanaged?
Gollum
@Gollum: "can you tell me why people do not run it on embedded devices?". People do run managed code on embedded devices. Embedded Java is very common, particularly in smart cards. One of my customers writes embedded C#/F# for Welch Allyn's stethoscopes. Look at the number of phone apps written in Java or the number of embedded telecoms systems written in Erlang, for example.
Jon Harrop
@jon, thanks but I am still confused, wouldn't the GC knock down the performance ? my arguments are theoretical, I don't have that much experience that I can judge anything. so just clearing my doubts.
Gollum
@Gollum: "wouldn't the GC knock down the performance?" Yes but it makes hard problems easier to solve and software complexity is increasingly more of a concern than performance as compute power continues to improve.
Jon Harrop
+8  A: 

The main disadvantages to using a garbage collector, in my opinion, are:

  1. Non-deterministic cleanup of resources. Sometimes, it is handy to say "I'm done with this, and I want it cleaned NOW". With a GC, this typically means forcing the GC to cleanup everything, or just wait until it's ready - both of which take away some control from you as a developer.

  2. Potential performance issues which arise from non-deterministic operation of the GC. When the GC collects, it's common to see (small) hangs, etc. This can be particularly problematic for things such as real-time simulations or games.

Reed Copsey
+1 *Adds GC to the list of reasons he dies in games*
glowcoder
@glowcoder: Isn't it great? I love being able to have something to blame for that... ;)
Reed Copsey
@Reed Copsey: Is Non-deterministic cleanup a common GC problem? or specific to particular GC implementations?
ragu.pattabi
@ragu.pattabi: GC's, by their nature, tend to nearly always have non-deterministic cleanup. That's really a major point to the GC in the first place - you don't worry about when or how memory is freed, and leave it to the collector.
Reed Copsey
+3  A: 

For .NET, there are two disadvantages that I can see.

1) People assume that the GC knows best, but that's not always the case. If you make certain types of allocations, you can cause yourself to experience some really nasty program deaths without direct invokation of the GC.

2) Objects larger than 85k go onto the LOH, or Large Object Heap. That heap is currently NEVER compacted, so again, your program can experience out-of-memory exceptions when really the LOH is not compacted enough for you to make another allocation.

Both of these bugs are shown in code that I posted in this question:

http://stackoverflow.com/questions/2860917/how-do-i-get-net-to-garbage-collect-aggressively

mmr
Large Object Heap is interesting. Is it .NET specific thing?
ragu.pattabi
@ragu.pattabi: Yes. Basically, in .NET, any single 85k allocation (ie: a large array of structures) will get allocated using a "traditional" style allocation, and not be compacted with the rest of the GC heap.
Reed Copsey
@Reed Copsey: That's interesting. I see, looking at specific GC implementations could point to limitations specific to them though most of the limitations are common.
ragu.pattabi
+1 - I have encountered some very crippling issues with the LOH and .Net, esp when doing any kind of COM-Interoperability or building Windows service modules that run for days. I wish there was a solution for forcing the purge of the LOH in .Net.
James
I wonder why the LOH allocates things on small boundaries rather than 4K boundaries? Padding an 85K+ object to the next 4K would waste at most 20% of space, and the LOH could then be compacted using the page table.As it is, my basic philosophy tend to be "avoid allocating anything over 80K".
supercat
+2  A: 

The biggest problem when it comes to performance (especially on or real-time systems) is, that your program may experience some unexpected delays when GC kicks in. However, modern GC try to avoid this and can be tuned for real time purposes.

Another obvious thing is that you cannot manage your memory by yourself (for instance, allocate on numa local memory), which you may need to do when you implement low-level software.

inflagranti
Typically, if you have very low level (ie: numa local memory) requirements, you can drop into traditional allocation for that...
Reed Copsey
Contrary to what I thought, .NET has a flavor for embedded applications called .NET Micro Framework. It has GC.
ragu.pattabi
@Reed Copsey: But then you have IMO already some hybrid system that allows both. I think most GC languages do not allow such thing.
inflagranti
+5  A: 

Take it from a C programmer ... it is about cost/benefit and appropriate use

The garbage collection algorithms such as tri-color/mark-and-sweep there is often significant latency between a resource being 'lost' and the physical resource being freed. In some runtimes the GC will actually pause execution of the program to perform garbage collection.

Being a long time C programmer, I can tell you:

a) Manual free() garbage collection is hard -- This is because there is usually a greater error rate in human placement of free() calls than GC algorithms.

b) Manual free() garbage collection costs time -- Does the time spent debugging outweigh the millisecond pauses of a GC? It may be beneficial to use garbage collection if you are writing a game than say an embedded kernel.

But, when you can't afford the runtime disadvantage (right resources, real-time constraints) then performing manual resource allocation is probably better. It may take time but can be 100% efficient.

Try and imagine an OS kernel written in Java? or on the .NET runtime with GC ... Just look at how much memory the JVM accumulates when running simple programs. I am aware that projects exist like this ... they just make me feel a bit sick.

Just bear in mind, my linux box does much the same things today with 3GB RAM than it did when it had 512MB ram years ago. The only difference is I have mono/jvm/firefox etc running. The business case for GC is clear, but it still makes me uncomfortable alot of the time.

Good books:

Dragon book (recent edition), Modern Compiler Implementation in C

Aiden Bell
+1, precisely .
Gollum
I agree with this - but... "performing manual resource allocation" in a way that's "100% efficient" often leads to basically writing your own "mini-GC". It's a LOT of effort to do well, and often leads to other problems. (For example, trying to prevent memory fragmentation is very challenging in C...)
Reed Copsey
@Aiden Bell: Not sure if this is radical. But why not when memory is not problem in near future. Improvement to OS kernel with GC could benefit all the apps run on the OS, where improvements to apps can only help them individually. GC could learn about the apps it runs (AI?) and behave in an optimal way.
ragu.pattabi
@Reed Copsey -- Tell me about it. But, a well defined program that is written as a set of 'spinning' algorithms can have perfect allocation without too much of a headache.
Aiden Bell
@ragu - It is all about weighing up the cost/benefit. It may be simpler for the programmer but the cost in an OS is greater than a basic business app.
Aiden Bell
@Aiden: I only mentioned this because I've had to write my own custom allocators with compaction. If you're application has a relatively flat memory usage pattern, it's pretty easy, and much more efficient, but if it's not, it's a royal pain in the ....
Reed Copsey
@ragu.pattabi: There is actually research in this area. Some serious, and a lot of hobbiest (like http://jos.sourceforge.net/) Most operating systems have their own memory management routines that do a lot of what a GC does for an executable already, though...
Reed Copsey
@Reed Copsey - I agree completely. I have rewritten some C apps because I have ended up passing context to functions and essentially taking an OOP approach ... when a C app gets that 'shape' of data I would usually go for Python and take a dive. I'm neither for or against either method ... each are suitable in given constraints.
Aiden Bell
@Reed, correct me if I am wrong, but mostly we write allocator to avoid fragmentation or to avoid copying the data on reallocation when size is not static (vector). in these cases, GC will be a big pain as well, if you are looking for performance.
Gollum
@Gollum - I wouls day heap fragmentation is a seperate issue to GC. Garbage collection doesn't help matters if you don't have a good allocation scheme - but an allocation routine that works with the GC can be beneficial if it reuses freed blocks from the GC.
Aiden Bell
@Aiden, agreed.
Gollum
@Gollum: How is a GC a "big pain" in that context?
Jon Harrop
+1  A: 

It is almost impossible to make a non-GC memory manager work in a multi-threaded environment without requiring a lock to be acquired and released every time memory is allocated or freed. A garbage-collection-based system can allow memory allocations to occur without requiring any locks. This is a major advantage. The disadvantage is that when garbage collection occurs, everything else has to stop until it's complete.

If processors were to return to a descriptor-based handle/pointer system (similar to what the 80286 used, though nowadays one wouldn't use 16-bit segments anymore), it would be possible for garbage collection to be done concurrently with other operations (if a handle was being used when the GC wanted to move it, the task using the handle would have to be frozen while the data was copied from its old address to its new one, but that shouldn't take long). Not sure if that will ever happen, though (Incidentally, if I had my druthers, an object reference would be 32 bits, and a pointer would be an object reference plus a 32-bit offset; I think it will be awhile before there's a need for over 2 billion objects, or for any object over 4 gigs. Despite Moore's Law, if an application would have over 2 billion objects, its performance would likely be improved by using fewer, larger, objects. If an application would need an object over 4 gigs, its performance would likely be improved by using more, smaller, objects.)

supercat
Interesting point of view. By the way, 2 billion objects / 4 gb an object? By that time, may be there would be alternative to OO in main stream dev, I guess :-)
ragu.pattabi
+1  A: 

Typically, garbage collection has certain disadvantages:

* Garbage collection consumes computing resources in deciding what memory is to be freed, reconstructing facts that may have been known to the programmer. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, often leading to decreased or uneven performance. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing.
* The point when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments such as device drivers, in transaction processing, or in interactive programs.
* Memory may leak despite the presence of a garbage collector, if references to unused objects are not themselves manually disposed of. This is described as a logical memory leak.[3] For example, recursive algorithms normally delay release of stack objects until after the final call has completed. Caching and memoizing, common optimization techniques, commonly lead to such logical leaks. The belief that garbage collection eliminates all leaks leads many programmers not to guard against creating such leaks.
* In virtual memory environments typical of modern desktop computers, it can be difficult for the garbage collector to notice when collection is needed, resulting in large amounts of accumulated garbage, a long, disruptive collection phase, and other programs' data swapped out.
* Perhaps the most significant problem is that programs that rely on garbage collectors often exhibit poor locality (interacting badly with cache and virtual memory systems), occupy more address space than the program actually uses at any one time, and touch otherwise idle pages. These may combine in a phenomenon called thrashing, in which a program spends more time copying data between various grades of storage than performing useful work. They may make it impossible for a programmer to reason about the performance effects of design choices, making performance tuning difficult. They can lead garbage-collecting programs to interfere with other programs competing for resources
gsoni
Quite amazing that I missed to check wikipedia for my question. Thanks! http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Disadvantages
ragu.pattabi
-1: Almost entirely wrong.
Jon Harrop