views:

221

answers:

6

Recently I answered another question asking for questions every decent C++ programmer should be able to answer. My suggestion was

Q: How does a pointer point to an object?
A: The pointer stores the address of that object.

but user R.. disagrees with the A I propose to the Q - he says that The correct answer would be "it's implementation-specific". While present-day implementations store numeric addresses as pointers, there's no reason it couldn't be something much more elaborate.

Definitely I can't disagree that there could be other implementations except storing an address just for the sake of disagreeing. I'm really interested what other really used implementations are there.

What are other actually used implementations of pointers in C++ except storing an address in an integer type variable? How is casting (especially dynamic_cast) implemented?

+6  A: 

On a conceptual level, I agree with you -- I define the address of an object as "the information needed to locate the object in memory". What the address looks like, though, can vary quite a bit.

A pointer value these days is usually represented as a simple, linear address... but there have been architectures where the address format isn't so simple, or varies depending on type. For example, programming in real mode on an x86 (e.g. under DOS), you sometimes have to store the address as a segment:offset pair.

See http://c-faq.com/null/machexamp.html for some more examples. I found the reference to the Symbolics Lisp machine intriguing.

Jander
Reminds me of far pointers when I did real-mode C programming a couple years back. Good times... Linear addressing is so boring :)
Matthew Iselin
Actually, segment:offset addressing was used in 16-bit OS/2 (versions 1.x) as well. This OS was based on the Intel 80286, which supported "protected mode" with 24-bit physical addressing and 16:16 (selector:offset) logical addressing where the selector part pointed to segment descriptors, indirecting to physical addresses.
LaszloG
A: 

Smart pointers are pointers

Pointers to non-static member functions may be complex structures, containing information about virtual functions tables.

Iterator is a generic pointer.

Probably correct question should look like:

Q: How does T* point to an object of type T? (T is not a type of non-static member function)
A: When you dereference value of type T*, it contains the address of that object. (In any other time it can contain anything)
Abyx
Smart pointer usually stores a usual pointer anyway. Could you please provide more details on how pointers to members contain information about vtables?
sharptooth
Smart pointers are just a class wrapping a standard pointer. Iterators are a typedef. Both end up being an address stored, that has different runtime usage and effects.
Matthew Iselin
@sharptooth Yes, smart pointers contains value of type T*, but not all of them are memory addresses. Different compilers use different techniques to manage virtual functions. There is no common answer, some compilers don't use vtables at all. Read documentation of your compiler.
Abyx
@Matthew Iselin : Usually iterator is a class with overloaded operators, and in some rare cases it's a typedef, only to increase performance.
Abyx
@Abyx: ah, of course. Friday afternoon does amazing things to one's brain :)
Matthew Iselin
+5  A: 

I would call Boost.Interprocess as a witness.

In Boost.Interprocess the interprocess pointers are offsets from the beginning of the mapped memory area. This allows to get the pointer from another process, map the memory area (which pointer address might be different from the one in the process which passed the pointer) and still get to the same object.

Therefore, interprocess pointers are not represented as addresses, but they can be resolved as one.

Thanks for watching :-)

Matthieu M.
A sincere welcome to [the league of extraordinary C++ contributors](http://stackoverflow.com/badges/49/c?userid=147192)! :-)
James McNellis
@James McNellis: Does this mean I am doomed :x ?
Matthieu M.
+2  A: 

You can use Segmentation pointers, baiscally you devided the memory into blocks of a fixed size(small) then divide that into segments(big collections of blocks), fixed size too, thus a pointer to an object can be stored as Seg:Block.

+-----------------------------------------------------------+
|Segment 1 (addr: 0x00)                                     |
| +-------------------------------------------------------+ |
| |Block 1|Block 2|Block 3|Block 4|Block 5|Block 6|Block 7| |
| +-------------------------------------------------------+ |
+-----------------------------------------------------------+
|Segment 2 (addr: 0xE0)                                     |
| +-------------------------------------------------------+ |
| |Block 1|Block 2|Block 3|Block 4|Block 5|Block 6|Block 7| |
| +-------------------------------------------------------+ |
+-----------------------------------------------------------+
|Segment 3 (addr: 0x1C0)                                    |
| +-------------------------------------------------------+ |
| |Block 1|Block 2|Block 3|Block 4|Block 5|Block 6|Block 7| |
| +-------------------------------------------------------+ |
+-----------------------------------------------------------+

so say we have the pointer 2:5, each segment is 7 blocks, each block is 32 bytes, then 2:5 can be translated into an x86 type pointer by doing ((2 - 1) * (7 * 32)) + (5 * 32), which yeilds 0x180 from the start of the first segment

Necrolis
+3  A: 

If we are familiar with accessing array elements using pointer arithmetic it is easy to understand how objects are layed out in memory and how dynamic_cast works. Consider the following simple class:

struct point
{
    point (int x, int y) : x_ (x), y_ (y) { }
    int x_;
    int y_;
};

point* p = new point(10, 20); 

Assume that p is assigned to the memory location 0x01. Its member variables are stored in their own disparate locations, say x_ is stored at 0x04 and y_ at 0x07. It is easier to visualize the object p as an array of pointers. p (in our case (0x1) points to the beginning of the array:

0x01
+-------+-------+
|       |       |
+---+---+----+--+
    |        |
    |        |
   0x04     0x07
 +-----+   +-----+
 |  10 |   | 20  |
 +-----+   +-----+

So code to access the fields will essentially become accessing array elements using pointer arithmetic:

p->x_; // => **p
p->y_; // => *(*(p + 1))

If the language support some kind of automatic memory management, like GC, additional fields may be added to the object array behind the scene. Imagine a C++ implementation that collects garbage with the help of reference counting. Then the compiler might add an additional field (rc) to keep track of that count. The above array representation then becomes:

0x01
+-------+-------+-------+
|       |       |       |
+--+----+---+---+----+--+
   |        |        |
   |        |        |
  0x02     0x04     0x07
+--+---+  +-----+   +-----+
|  rc  |  |  10 |   | 20  |
+------+  +-----+   +-----+

The first cell points to the address of the reference count. The compiler will emit appropriate code to access the portions of p that should be visible to the outside world:

p->x_; // => *(*(p + 1))
p->y_; // => *(*(p + 2))

Now it is easy to understand how dynamic_cast works. Compiler deals with polymorphic classes by adding an extra hidden pointer to the underlying representation. This pointer contains the address of the beginning of another 'array' called the vtable, which in turn contain the addresses of the implementations of virtual functions in this class. But the first entry of the vtable is special. It does not point to a function address but to an object of a class called type_info. This object contains the run-time type information of the object and pointers to type_infos of its base classes. Consider the following example:

class Frame
{
public:
    virtual void render (Screen* s) = 0;
    // ....
};

class Window : public Frame
{ 
public:
    virtual void render (Screen* s)
    {
        // ...
    }
    // ....
private:
   int x_;
   int y_;
   int w_;
   int h_;
};

An object of Window will have the following memory layout:

window object (w)
+---------+
| &vtable +------------------+
|         |                  |
+----+----+                  |
+---------+     vtable       |            Window type_info    Frame type_info
|  &x_    |     +------------+-----+      +--------------+    +----------------+
+---------+     | &type_info       +------+              +----+                |
+---------+     |                  |      |              |    |                |
|  &y_    |     +------------------+      +--------------+    +----------------+
+---------+     +------------------+
+---------+     | &Window::render()|
+---------+     +------------------+    
+---------+                     
|  &h_    |
+---------+

Now consider what will happen when we try to cast a Window* a Frame*:

Frame* f = dynamic_cast<Frame*> (w);

dynamic_cast will follow the type_info links from the vtable of w, confirms that Frame is in its list of base classes and assign w to f. If it cannot find Frame in the list, f is set to 0 indicating that the casting failed. The vtable provides an economic way to represent the type_info of a class. This is one reason why dynamic_cast works only for classes with virtual functions. Restricting dynamic_cast to polymorphic types also makes sense from a logical point of view. This is, if an object has no virtual functions, it cannot safely be manipulated without knowledge of its exact type.

The target type of dynamic_cast need not be polymorphic. This allows us to wrap a concrete type in a polymorphic type:

// no virtual functions
class A 
{
};

class B
{
public:
    virtual void f() = 0;
};

class C : public A, public B
{
    virtual void f() { }
};


C* c = new C;
A* a = dynamic_cast<A*>(c); // OK
Vijay Mathew
+2  A: 

Pointers to objects do store (representations of) what C++ calls "addresses". 3.9.2/3, "A valid value of an object pointer type represents either the address of a byte in memory (1.7) or a null pointer (4.10)."

I think it's fair to say therefore that they "store" addresses, it's just that saying so doesn't convey much. It's just another way of saying what pointers are. They may store other information as well, and they may store the actual physical/virtual numeric address by reference to some other structure elsewhere, but in terms of C++ semantics, a pointer variable contains an address.

Abyx raises the issue that only object and function pointers represent addresses. Pointers-to-member don't necessarily represent an address, as such. But the C++ standard specifically says that the word "pointers" in the standard shouldn't be taken to include pointers-to-member. So you might not count that.

Other than segment:offset (which obviously is an address consisting of two numbers), the most plausible "funny pointer" I can think of would be one in which some type information is contained in the pointer. It's unlikely in C++ that you'd want to fiendishly optimize RTTI at the cost of reducing the space you can address, but you never know.

Another possibility is that if you were implementing a garbage-collected C++, then each pointer could store information about whether it points to stack or heap, and perhaps you could sneak in some information to help with accurate vs. conservative marking.

I've not encountered anyone doing either of those things with pointers in C++, though, so I can't vouch for them being real uses. There are other ways of storing type and GC information, which might well be better.

Steve Jessop