tags:

views:

463

answers:

6

I always wanted to know what is the real thing difference of how the compiler see a pointer to a struct (in C suppose) and a struct itself.

struct person p;
struct person *pp;

pp->age, I always imagine that the compiler does: "value of pp + offset of atribute "age" in the struct".

But what it does with person.p? It would be almost the same. For me "the programmer", p is not a memory address, its like "the structure itself", but of course this is not how the compiler deal with it.

My guess is it's more of a syntactic thing, and the compiler always does (&p)->age.

I'm correct?

+28  A: 

p->q is essentially syntactic sugar for (*p).q in that it dereferences the pointer p and then goes to the proper field q within it. It saves typing for a very common case (pointers to structs).

In essence, -> does two deferences (pointer dereference, field dereference) while . only does one (field dereference).

Due to the multiple-dereference factor, -> can't be completely replaced with a static address by the compiler and will always include at least address computation (pointers can change dynamically at runtime, thus the locations will also change), whereas in some cases, . operations can be replaced by the compiler with an access to a fixed address (since the base struct's address can be fixed as well).

Amber
+1 for "syntactic sugar" (and a good explanation)
advs89
Ok. But my question is what is exactly "p" to the compiler?, its also an address, isnt it?. But for me the programmer is "the structure itself"
fsdfa
Assuming `p` is a local variable, it's an offset within the stack.
Amber
Just can't agree with calling that anything like 'sugar', it has always being the syntax that annoys me the most on C! Would call 'shortcut', a fugly one.. +1 for great explain
Fabiano PS
@Fabiano - "syntactic sugar" is a fairly common term in the CS world: http://en.wikipedia.org/wiki/Syntactic_sugar
Amber
And whats the difference (for the compiler) of pp and *pp
fsdfa
`pp` is "the value contained in memory where the variable pp is stored". `*pp` is "the value contained in some other memory which begins at the address that is contained in the memory where the variable pp is stored"
Amber
and "." does a offset thing I guess. Im satisfied.
fsdfa
@Amber - Not saying you are wrong, just saying I don't agree about it being sugar, see your link: "It makes the language "sweeter" for humans to use", for me that is sour, I would like if it were instead: '..', then it is sugar...
Fabiano PS
@Fabiano PS: "Syntactic sugar" is a pretty standard term (and Dijkstra said it can cause "cancer of the semicolon"). Bear in mind that some people don't like sugar (and would you really prefer `(*p).` to `p->`?).
David Thornley
@fsdfa - The answer to your question about `p`. At runtime `p` doesn't really exist. It exists implicitly in the code that's used to modify a specific set of memory locations. At compile time `p` is just a name the compiler internally uses to represent a set of memory locations an operation is being done on. Frequently the compiler doesn't know what those memory locations are, except that they exist as offsets to other memory locations that are only known at runtime (such as the stack pointer). So `p` generally does not ever exist anywhere as an actual pointer.
Omnifarious
+5  A: 
e.James
If it is declared on the stack it doesn't know the exact address at compile time, only the offset against the stack pointer.
cjg
When it sees p.age it knows the **offset** within the struct, but doesn't (unless it's a global variable) know the absolute address of the struct.
ChrisW
@cjg and @ChrisW: Good points. I'll adjust my answer.
e.James
To whomever reversed their downvote: thank you! I've never seen that done before `:)`
e.James
+1  A: 

Since p is a local (automatic) variable, it is stored in the stack. Therefore the compiler accesses it in terms of offset with regard to the stack pointer (SP) or frame pointer (FP or BP, in architectures where it exists). In contrast, *p refers to a memory address [usually] allocated in the heap, so the stack registers are not used.

Tomer Vromen
You can pass a stack-allocated struct by address to a subroutine: in which case the pointer received by the subroutine is to an object on the stack.
ChrisW
@ChrisW: Good point. Of course, the compiler still accesses the struct as a straightforward address (without using SP).
Tomer Vromen
+3  A: 

The two statements are not equivalent, even from the "compiler perspective". The statement p.age translates to the address of p + the offset of age, while pp->age translates to the address contained in pp + the offset of age.

The address of a variable and the address contained in a (pointer) variable are very different things.

Say the offset of age is 5. If p is a structure, its address might be 100, so p.age references address 105.

But if pp is a pointer to a structure, its address might be 100, but the value stored at address 100 is not the beginning of a person structure, it's a pointer. So the value at address 100 (the address contained in pp) might be, for example, 250. In that case, pp->age references address 255, not 105.

Tyler McHenry
In other words, reading from `p.age` conceptually requires one memory read, from the (known) location of `p` plus the offset of `age`. Whereas reading from `pp->age` conceptually requires two memory reads - one from the location of `pp`, then a second from the location given by the first read plus the offset of `age`.
caf
A: 

In both cases the structure and its members are addressed by

address(person) + offset(age)

Using p with a struct stored in the stack memory gives the compiler more options to optimize memory usage. It could store the age only, instead of the whole struct if nothing else is used - this makes addressing with the above function fail (I think reading the address of a struct stops this optimization).
A struct on the stack may have no memory address at all. If the struct is small enough and only lives a short time it can be mapped to some of the processors registers (same as for the optimization above for reading the address).

The short answer: when the compiler does not optimize you are right. As soon as the compiler starts optimizing only what the c standard specifies is guaranteed.

Edit: Removed flawed stack/heap location for "pp->" since the pointed to struct can be on both heap and stack.

josefx
The location of the variable has *nothing* to do with the addressing mode used to access it. With `pp->` the structure could be anywhere, heap, stack, “global”. With `p.` the structure can be in any of the above too except for heap memory ('cos you can't declare variables there directly). I always like to think of pointers as being like arrows, pointing to arrays of memory cells. As analogies go, it seems easy and not that far wrong.
Donal Fellows
@Donal Fellows you are right about "pp->" I only thought about the simplest use case. If age was a struct you could address it with pp->age.days so the "." doesn't say much about the memory location either.
josefx
@josefx: That's true. The difference isn't related to location, but rather whether you've got the name of a thing (i.e., a pointer to it) or the thing itself (the memory cell/structure/array/…; in Kantian philosophy “Ding an sich”).
Donal Fellows
A: 

This is a question I've always asked myself.

v.x, the member operator, is valid only for structs. v->x, the member of pointer operator, is valid only for struct pointers.

So why have two different operators, since only one is needed? For example, only the . operator could be used; the compiler always knows the type of v, so it knows what to do: v.x if v is a struct, (*v).x if v is a struct pointer.

I have three theories:

  • temporary shortsightedness by K&R (which theory I'd like to be false)
  • making the job easier for the compiler (a practical theory, given the conception time of C :)
  • readability (which theory I prefer)

Unfortunately, I don't know which one (if any) is true.

ΤΖΩΤΖΙΟΥ