views:

466

answers:

9

I was wondering how a Garbage Collector can be implemented with C++ full power of pointers arithmetic. Also, In languages like Java, I can not assign literal addresses to references. In C++ it is very flexible.

I believe that C# has both, but again, the unsafe pointer in C# is the responsibility of the programmer.

EITD :: guys, I am asking if C++ pointers 'as they are currently' can be GCed in theory or not ?

A: 

Yes, a GC can be implemented independent of type safety, which C++ does not have to any great extent; but the optimization opportunities available to the GC will be limited by the amount of type leeway the language permits.

If one is willing to live with runtime-checked pointer operations, thereby re-achieving type safety where the language has given up, then a GC can be a lot more aggressive. As it is, garbage collected C++ is usually restricted to conservative collectors like the Boehm-Demers-Weiser GC.

Barry Kelly
but how does it deal with pointer arithmetic
hhafez
Conservative collection relies on all valid memory having at least one valid pointer to it. If you obscure a reference, the referred memory may be freed; but if a random word points to some memory, it counts as a reference and will keep it alive.
Barry Kelly
how are references maintained? If its something like smart_ptr then the pointers are not raw.
0xC0DEFACE
Conservative collectors can examine every pointer-sized word of valid memory and check to see if it "looks like" a pointer to a valid allocated block, e.g. by a mechanism similar to page tables - a tree of prefixes masked out of the pointer value. For a more aggressive precise collector, at least type info about pointers and runtime checking of pointer operations and arithmetic would be required, to get the required level of type safety.
Barry Kelly
But type info could be provided by a compiler that was engineered in coordination with the garbage collector; as could RTL stubs for pointer arithmetic type-checking.
Barry Kelly
+1  A: 

Boehm GC?

It's written in C not C++, and is not a perfect match for C++ because it does not invoke destructors, but probably fits the bill more or less.

Whatever
It's not 100% reliable though. It may not detect when a piece of data is no longer referenced.
jalf
A: 

I am not an expert on GC implementation but if you allow the use of pointer arithmetic then you can not ever certain that a given address in memory can no longer be referenced.

as an example

int *crazy_pointer;
srand ( time(NULL) );

while(true) {
  crazy_pointer = (int *) rand() % MAX_SIZE_OF_MEMORY:
  printf("$d",*crazy_pointer);
}

Effectively I've used pointer arithmetic in a very basic way to print out the contents of random locations in memory. This means that all the memory on the machine is essentialy reachable. And if it is reachable it can not be GCed.

Yes the above will likely cause a crash on modern Operating Systems but this code is perfectly legal C/C++. There is no way you can implement a GC that can protect against this kind of abuse if you allow pointer arithmetic :)

hhafez
Conservative GC can handle pointers to random locations just fine - it relies on the OS giving it a list of valid memory pages, and simply won't follow pointers to unmapped pages.
Barry Kelly
ok, good it can't go to unmapped pages, that prevent's the crashes, but how does it know that I can no longer reference a peice of memory? I can still asign to crazy_pointer an arbitrary memory location at runtime and the GC will never know what I chose.
hhafez
simple. By saying "what you're doing is undefined behavior". The language doesn't permit you to take the result of rand() and use it as a pointer. It is *not* "perfectly legal C/C++". Pointer arithmetic, according to the standard, is a lot more limited than many people think. :)
jalf
Are you telling me that code, wont compile? It will. How will the GC handle the above code.
hhafez
Just because something compiles doesn't mean it's valid. Your code is perfectly INVALID code, C/C++ prohibits even pointing to memory outside of application allocated memory. (and one past)
GMan
Like GMan said, it will (probably) compile, it just isn't valid. The language isn't required to do *anything* about your code. The compiler doesn't have to compile it, it doesn't have to issue an error either. At runtime, it doesn't have to do what you expect, nor does it have to refrain from formatting your harddrive or downloading porn. It is undefined behavior. The language standard does not define what the effect of this code should be. Welcome to the wonders of C++. :)
jalf
+1  A: 

Do you mean something like smart_ptr?

Paul Nathan
+1  A: 

Yes. There was a system at NeXT back in the 1990s that worked like this: they keep track of every C/C++ memory block that has been allocated. Periodically they scan memory to see if the 4-byte (or 8-byte) value associated with that memory address is present. If it isn't, they assume there are no references, and they free the memory. It's neat and easy! But sometimes it screws up.

Here are some other approaches: http://www.hpl.hp.com/personal/Hans_Boehm/gc/

http://developers.sun.com/solaris/articles/libgc.html

vy32
A: 

Sure why not? Pointer arithmetic from a garbage collector's perspective is no different than indexing into an array. It's things like the xor doubly linked list that screw up GC, but that's not pointer arithmetic, and undefined behavior besides. Also, Boehm exists, which is sort of empirical evidence, it's a garbage collector that works for C++. Heck for a while there they were threatening to include GC as an optional part of C++0x.

Logan Capaldo
+6  A: 

Pointer arithmetic isn't the fundamental problem. GC's have to deal with pointers being reassigned all the time, and pointer arithmetic is just another example of that. (Of course, if pointer arithmetic between pointers pointing to different buffers was allowed, it would cause problems, but it isn't. The only arithmetics you're allowed to perform on a pointer pointing into an array A are the ones that repositions it within that array.

The real problem is the lack of metadata. A GC has to know what is a pointer and what isn't.

If it encounters the value 0x27a2c230, it has to be able to determine if it is

  • a pointer (in which case it has to follow the pointer to mark the destination as "in use" recursively)
  • An integer (The same value is a perfectly valid integer. Perhaps it's not a pointer at all)
  • or something else, say, a bit of a string.

It also has to be able to determine the extent of a struct. Assuming that value is a pointer, and it points into another struct, the GC has to be able to determine the size and extent of that struct, so it knows which range of addresses should be scanned for more pointers.

GC'ed languages have a lot of infrastructure to deal with this. C++ doesn't.

Boehm's GC is the closest you can generally get, and it basically just assumes everything is a pointer, which means some data is needlessly kept alive.

Alternatively, of course all this infrastructure could in principle be added to a C++ compiler. There's no rule in the standard that it's not allowed to exist. The problem is that it would be a major performance hit and eliminate a lot of optimization opportunities.

jalf
A: 

The question is why would you? 90% of the time where you think garbage collection would be nice, a smart pointer vector (as in a vector/map that deletes the object when it is removed) will suffice. Most of the rest of the 10% can be handled by using ref counted interfaces.

Igor Zevaka
It is just more of a theoretical question that puzzled me :)
AraK
A: 

C++ pointer arithmetic allows you to construct N+1 pointers to a T[N], &T[0]...&T[N-1] and a sentinel &T[N-1]+1. These are all pointers to the T[N] array object. In that sense, they are similar to other "interior" pointers such as &foo.bar (address of object member).

Other pointer arithmetic is undefined behavior, and one obvious example of UB could be the unintentional deletion of the array by the GC.

MSalters