views:

698

answers:

13

I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?

Was there a specific reason garbage collection was not implemented for C?

+16  A: 

C was invented back in the early 1970s for writing operating systems and other low level stuff. Garbage collectors were around (eg. early versions of Smalltalk) but I doubt they were up to the task of running in such a lightweight environment, and there would be all the complications of working with very low level buffers and pointers.

winwaed
This is a good answer, but the 2nd paragraph of wllmsaccnt's answer was more what I was looking for.
Raven Dreamer
+1 For pointing out that GC's were *available* at the time C was created. Many people seem to forget that all these "modern bells and whistles" have existed (in some form or another) for many decades now.
pst
@Raven Dreamer: It may be more what you were looking for -- but unfortunately, it's almost completely wrong. @winwaed's answer is *much* more accurate.
Jerry Coffin
pst, my point is that C is C because C was designed as C. If it were designed with a garbage collector it probably would not been as useful to early systems programmers and met the same needs of early programmers. It doesn't have the bells and whistles not because they weren't available, but because systems were slow back then and garbage collection wasn't a priority.
wllmsaccnt
+12  A: 

C is a very old language and lacks many of the bells and whistle's of modern languages. To add garbage collection now would require a major respec of the language. Generally anyone willing to make that many changes to C would just rather create their own language.

Adding automatic garbage collection onto a language generally will either reduce performance, or will cause the garbage collection to happen at un forseen times. To add garbage collection to C would cause it to lose one of it's comparative advantages, in that it can be used for system level programming that requires real time or near real time response times.

wllmsaccnt
I'll go out on a limb here and say that while I agree with your answer in general, there is no such thing as real time. All a "realtime" program does is run as fast as is required - you can have "realtime" systems that respond in hours or days. I think that the concept you are looking for is "deterministic". Adding GC would have made C less deterministic and hence harder to verify that it would have achieved the desired response for times required by the users expectations of the day.
Peter M
Unfortunately, most of this is complete nonsense. There are only a *few* (mostly trivial) changes necessary to allow conservative GC. Allowing "precise" GC requires more changes, but still *well* short of a "major respec". There are real-time garbage collectors that provide much *more* deterministic timing than the heap managers used in most implementations of malloc/calloc/realloc/free.
Jerry Coffin
C is entrenched. To add anything to the spec at this point would be a major re specification. I was referring to the resistance that would be put up by the C developer community in implementing the needed changes to compilers, not the amount of work that would go into the actual garbage collection implementation on the language end. For most C developers it would be idiomatically...blasphemy.
wllmsaccnt
@Jerry: Any non-threaded `malloc` implementation has trivial-time, `O(1)` allocation and deallocation, unless the author was completely stupid. With threads the situation is a little worse, but claiming it's less deterministic than GC is utter nonsense.
R..
@R: Pardon my being blunt, but you don't seem to have a clue of what you're talking about. Though certainly not brand new, ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps gives some idea of how real allocators work.
Jerry Coffin
Pro-GC propaganda is not the source I'd go to for information on how real allocators work. My classic source is `dlmalloc`, but I'll admit it's been a long time since I read the details.
R..
+2  A: 

1972.

C was designed in 1972.

Worse, c was designed on obsolete hardware. In 1972.

Don't get me wrong. They had garbage collection in 1972, but all the issues that people complain about to this day were real, very valid concerns at the time.

dmckee
They're still real, valid concerns.
R..
Unix was originally designed for/written on obsolete hardware -- but by the time they invented C, they were using current hardware.
Jerry Coffin
Smalltalk also 1972.
Amigable Clark Kant
@Amigable: but they also designed custom hardware specifically to run Smalltalk.
Jerry Coffin
@Jerry Coffin, I did not know that. Interesting.
Amigable Clark Kant
+1  A: 

Garbage Collection is not useful if you want to make hardcore optimisations, because if you decide the moment when it's best to free allocated space, then you can make a much better job in your project than the garbage collector does. In the 1970s I don't know the exact cause why the developers decided to not include a garbage collector to C, probably they wanted to have absolute control of deallocation, however I'm sure later they didn't even consider to add a garbage collector, because even if there is an integration of a garbage collector to the language which is very good, there were built a lot of C projects and the new versions of the language must have backwards compatibility.

EDIT: "There are also compilers, libraries and operating system level mechanisms for performing array bounds checking, buffer overflow detection, serialization and automatic garbage collection, that are not a standard part of C." This may be interesting for you, you can read more here.

Lajos Arpad
+4  A: 

C is a very low level language. It's the kind of language you might choose to write a higher-level language with things such as garbage collection in. It's small and simple, and does exactly what you ask.

It took C++ to build on C and add more sophisticated/automatic memory management (things like calling destructors when objects went out of scope). You may then wonder why C++ doesn't have garbage collection, in which case see what Stroustrup has to say: briefly, people want to do things in a more direct manner, and people who really want it can use a library (which can also be used in C).

beldaz
+11  A: 

Garbage collection has been implemented for C (e.g., the Boehm-Demers-Weiser collector). C wasn't specified to include GC when it was new for a number of reasons -- largely because for the hardware they were targeting and system they were building, it just didn't make much sense.

Edit (to answer a few allegations raised elsethread):

  1. To make conservative GC well-defined, you basically only have to make one change to the language: say that anything that makes a pointer temporarily "invisible" leads to undefined behavior. For example, in current C you can write a pointer out to a file, overwrite the pointer in memory, later read it back in, and (assuming it was previously valid) still access the data it points at. A GC wouldn't necessarily "realize" that pointer existed, so it could see the memory as no longer being accessible, and therefore open to collection, so the later dereference wouldn't "work".

  2. As far as garbage collection being non-deterministic: there are real-time collectors that are absolutely deterministic and can be used in hard real-time systems. There are also deterministic heap managers for manual management, but most manual managers are not deterministic.

  3. As far as garbage collection being slow and/or thrashing the cache: technically, this is sort of true, but it's purely a technicality. While designs (e.g., generational scavenging) that (at least mostly) avoid these problems are well known, it's open to argument that they're not exactly garbage collection (even though they do pretty much the same things for the programmer).

  4. As for the GC running at unknown or unexpected times: this isn't necessarily any more or less true than with manually managed memory. You can have a GC run in a separate thread that runs (at least somewhat) unpredictably. The same is true of coalescing free blocks with manual memory management. A particular attempt at allocating memory can trigger a collection cycle, leading to some allocations being much slower than others; the same is true with a manual manager that uses lazy coalescing of free blocks.

  5. Oddly, GC is much less compatible with C++ than with C. Most C++ depends on destructors being invoked deterministically, but with garbage collection that's no longer the case. This breaks lots of code -- and the better written the code, the bigger of a problem it generally causes.

  6. Likewise, C++ requires that std::less<T> provide meaningful (and, more importantly, consistent) results for pointers, even when they point to entirely independent objects. It would require some extra work to meet this requirement with a copying collector/scavenger (but I'm pretty sure it is possible). It's more difficult still to deal with (for example) somebody hashing an address and expecting consistent results. This is generally a poor idea, but it's still possible, and should produce consistent results.

Jerry Coffin
And it has serious irreparable flaws. See my answer.
R..
@R: At least IMO, your answer doesn't point to truly irreparable flaws. Rather, it points to a problem that's mostly theoretical, and even in the fairly rare case that it might become significant, it's easy to work around. That's not to say GC is perfect by any means, but your answer doesn't strike me as showing its strengths or weaknesses particularly accurately either.
Jerry Coffin
@Jerry: An easily-exploitable DOS vulnerability is not "mostly theoretical".
R..
And "coalescing free blocks" is not some behemoth in `malloc`/`free` like you pretend. It's simply checking a block's neighbors on `free` and merging the neighbors if they're unused, which is trivial in the common single-thread case where no locking is required.
R..
1) I never said anything like "behemoth", nor did I "pretend" anything. Your claim is an outright falsehood. 2) Any "exploit" requires (at bare minimum) knowing addresses of dynamically allocated memory, and being able to get the program to read and store arbitrary data (more or less intact). While both are undoubtedly possible in at least some cases, neither is anywhere close to universally true. There are *lots* of simpler attacks that apply much more widely.
Jerry Coffin
De Bruijn sequences make the guesswork a lot easier, and other (but possibly mutually exclusive) heuristics should let you guess lots of valid pointers as well. I have little doubt that such an attack would be very feasible against web browsers, chat clients, etc. As for putting words in your mouth, I apologize.
R..
I like "but most manual managers are not deterministic". People tend to forget that.
Amigable Clark Kant
I may be slinging FUD here, but with Rails, in a language which has _real_ GC, it is deemed perfectly normal to restart several times a minute, because of memory problems. So GC, even real GC, is no silver bullet and a little leniency goes a long way. And this comes from someone using libgc.
Amigable Clark Kant
+15  A: 

Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

  • Unpredictability
  • Cache pollution
  • Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.

R..
Scan memory? Don't you just count references like java does?
CromTheDestroyer
Reference counting and garbage collection are not the same thing. Reference counting is clean and elegant, but cannot automatically free circular linked lists and other circularly-referenced objects, and it's very difficult to make it efficient in a multithreaded environment where different threads could be trying to increment or decrement the reference count at the same time.
R..
-1 I can think of at least two garbage collectors for C off the top of my head. So "there are fundamental problems with GC that cannot be overcome which make it incompatible with C" is a false statement.
JeremyP
R. thanks for the tip about the De Bruijn Sequences though. I never really thought about the possibility of a DOS attack this way. I will be sure to clean and free my input data thoroughly.
Amigable Clark Kant
@JeremyP: And they're subject to the flaws I explained in my post. The fact that someone can write a buggy, vulnerable implementation does not contradict my claim that one cannot write a correct, robust implementation.
R..
@R..: but it is possible to write a correct enough robust enough implementation.
JeremyP
@JeremyP: It is not possible. That is the essence of my answer. A program which leaks memory at the control of an attacker is not remotely robust, much less "robust enough".
R..
+4  A: 

As a language, C was intentionally designed to be really small. It has very few intrinsic operations or features, and most of those reflect the basic instructions found in CPUs. It's often been called a "portable assembly language".

That makes it great for writing portable programs to switch telephone calls around, that run on dinky little computers with very little memory, which is part of what Bell Labs was after in the early 70s when they were first working on C and Unix.

That also makes C great for writing OS kernels, device drivers, and so on. Such executables can't count on having rich, sophisticated runtime environments like garbage collection - or even just dynamic memory allocation like malloc(). That's because they actually form the foundation for those environments, so you run into the "pulling yourself up by your bootstraps" problem.

People have written garbage-collected memory allocation libraries for C. But libraries aren't part of the C language itself, and it's very unlikely something that complex would ever be accepted as part of the standard C libraries. C programmers are a very conservative lot around changes to the standard.

Bob Murphy
+5  A: 

I'm sorry, but saying that C doesn't have GC because it's old or because it was designed for different hardware is a bit ridiculous.

Asking why does C has no GC but other languages do is a bit like asking "why does a hammer have a blunt end while an axe has a sharp end"? Well, languages are tools, and different tools were meant to build different programs with different requirements. Some programs, like device drivers or operating system kernels, have requirements that make it nearly impossible to implement garbage collection while still satisfying them, so a language like C was needed, so C was created.

Just like every other tool, it is the way it is because the problems it was meant to solve forced it to be that way. People who say it was because it was created in the 70s or because of old hardware make it seem like it was completely circumstantial. If this was true, C would have gone away with the 70s, but it's still a popular language.

Besides, gc is very hard to implement cleanly without some sort of runtime system, but C is meant to run on bare hardware.

CromTheDestroyer
"Some programs, like device drivers or operating system kernels, have requirements that make it nearly impossible to implement garbage collection while still satisfying them, so a language like C was needed, so C was created." -- Could you give a specific example?
Raven Dreamer
I like the last paragraph more than the rest of the answer.
Amigable Clark Kant
+2  A: 

Garbage collection could be implemented for C, though with normal CPU hardware it would be somewhat difficult and inefficient. To really make garbage collection work well, one would need an architecture which used base+offset addressing for everything; the base part of every pointer would be required to always point at the start of an allocated region. Note that if the base were a "selector" rather than a linear address, a 32-bit selector plus a 32-bit offset could access 4 billion objects, each of which could be up to 4 gigs. I would think such a design could be more efficient than 64-bit linear addressing for many applications, since object references and offsets would use half as much cache space as with 64-bit pointers. Using 64 bits just seems like a waste.

BTW, an architecture which used base+offset address for everything would have another advantage over a linear-addressing system: garbage collection could be run concurrently with other operations. The only time garbage collection would have to disturb a running thread would be if the thread tried to access a data object while that particular object was being relocated. Under that scenario, either the relocation attempt would have to be aborted (assuming it was a non-overlapping copy), or the thread would have to wait for that particular object to be copied. A much better situation for real-time code than exists when all threads have to be suspended for GC.

supercat
+3  A: 

Objective_C is ANSI C with something like 12 keywords added and a type called id. It has garbage collection. It uses reference counting.

ValiRossi
Reference counting is completely different from garbage collection, and suffers none of GC's critical flaws. Unfortunately the academic perverts care more about being able to write circular linkes lists...
R..
It has real garbage collection too.
JeremyP
A: 

I like what JWZ has to say about C. :-)

C is "the PDP-11 assembler that thinks it's a language".

So, if you look at how you would create a portable assembler, then C looks totally right without garbage collection, for the simple reason CPUs do not have a "garbage collect" instruction.*

(While most other CPU instructions are captured just fine with C to this day, some are not though, like C lack of expressiveness for SIMD operations etc.)

* Yes, I know some of you will find an example to prove me wrong. But in the general case...

Amigable Clark Kant