tags:

views:

127

answers:

5

When the compiler sees this code: SomeClass foo; int x = foo.bar;

What is the process it goes about in retrieving the value of bar? I.e. does it look at some data structure representing the class definition? If so is this data structure generated at compile time or runtime?

+6  A: 

The compiler has the address of foo. At that address, there is enough space for member variables (sizeof(SomeClass)), which probably includes some padding.

It knows that `bar is at some position in the class (usually the order they were declared, plus some other magic like inheritance), and jumps to that offset.

That is:

struct SomeClass
{
    short s;
    float f;
    int bar;
    char *c;
}

// pseudo-code:
&SomeClass.bar == (&SomeClass) + sizeof(short) + sizeof(float);

At run-time, it gets that data, and assigns it to x

GMan
"The compiler has the address of foo" - well, usually it has the offset of foo relative to some stack pointer, since foo is an automatic variable.
Steve Jessop
Which is still an address :)
GMan
This is technically inaccurate. The compiler does not (usually) assign the foo.bar to x. The compiler generates machine instructions which assign foo.bar to x at runtime.
JSBangs
That's what I meant...I'll add at run-time I suppose, though I thought that would be implied.
GMan
outis
@GMan: well generally at compile time, all the compiler knows is the offset. Normally the OS will provide the actual address of the stack at runtime. Obviously there exist architectures where the stack is mapped at a fixed virtual (or even physical) address, and so the compiler could know the actual address. But I don't think that should be taken as the norm.
Steve Jessop
Oh, I see what you're saying. I'm in the run-time mindset, but you can edit it to make it more clear if you' like.
GMan
I don't think I need to edit, it just depends what's understood by "has". At compile time the compiler knows how it will calculate the address, and it knows that by runtime, the value will have been calculated and (in unoptimised code) will be in a particular register. So it "has" it in that register (meaning, it knows it will be there in the future). I'm just assuming that since the question is all about what information is where during compile time and runtime, the questioner will be interested in the little details that make all the difference between compilers and interpreters.
Steve Jessop
+2  A: 

The process starts when the compiler sees the definition for SomeClass. Based on that definition, it builds an internal structure that contains the types of the fields in SomeClass, and the locations of the code for the methods of SomeClass.

When you write SomeClass foo; the compiler finds the code that corresponds to the constructor for SomeClass, and creates machine instructions to call that code. On the next line you write int x = foo.bar. Here the compiler writes machine instructions to allocate stack space for an int, and then looks at its data structure for SomeClass. That data structure will tell it the offset in bytes of bar from the beginning of the foo object. The compiler then writes machine code to copy the bytes corresponding to bar into the memory for x. All of this machine code gets written into your executable.

Generally, the data structures representing SomeClass and other definitions are thrown away once compilation is done. What you have left is just a set of machine instructions. Those instructions are executed when you actually run your program, so that the constructor for SomeClass and the code to copy foo.bar into x are executed by the CPU without any explicit knowledge of the structure of your objects.

This is the general case. There are special cases for when you run your code under a debugger and for optimization, but this is generally what happens.

JSBangs
+1  A: 

You have to think that during compilation every class is turned into a struct (to simplify explanation), so if you have

class Foo
{
   int x, y, z;
   char bar[10];
   ... etc ...
}

they are turned into a struct that has a specified size, in this case 4*3 + 10 bytes. Then it arranges them in the more convenient way according to alignment too, remembering that for example at offset 4 you can find the attribute y while at address 8 you can find z.

Then it's easy, just add 4 to the address of the class involved in assignment and you obtain the address of y and so on.

Jack
Use `sizeof(int)` in place of your 4's. `int`'s aren't necessarily 4 bytes
GMan
+1  A: 

The compiler stores such class metadata only at compile time. Your first question, how does it retrieve the value of bar, is actually quite complicated. You could think of it as calculating the offset of bar from the object foo and then reading the memory at that location. Depending on how x is actually used, however, it could do something much different. In some situations, 'x' might not appear in the compiled code at all.

Peter Ruderman
+4  A: 

At compile time, the compiler will have some data structure which tells it how to access each of the members of SomeClass. For simple cases it will just be an offset, but there might be more to it if you have non-trivial inheritance.

In order to handle your expression, the compiler consults this internal data and (eventually) emits the appropriate machine code. By runtime, this structure will have been thrown away, and all that's left is the code emitted to do whatever is needed, starting with the address of foo. However, if you have a pointer-to-member for bar, then the details of how to access the bar member are in some way encapsulated in that pointer value (maybe an offset, maybe something more complex).

Steve Jessop