views:

7834

answers:

6

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]

Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a] ?

Edit: The accepted answer is great. For a lower level view of how this works, see the comments section on that answer. There's a phenomenal conversation there about it. (This edit written about the comments available at the time. ie: the first ~16)

+265  A: 

Because a[5] will evaluate to:

*(a + 5)

and 5[a] will evaluate to:

*(5 + a)

and from elementary school math we know those are equal.

This is the direct artifact of arrays behaving as pointers, "a" is a memory address. "a[5]" is the value that's 5 elements further from "a". The address of this element is "a + 5". This is equal to offset "a" from "5" elements at the beginning of the address space (5 + a).

Mehrdad Afshari
I wonder if it isn't more like *((5 * sizeof(a)) + a). Great explaination though.
John MacIntyre
Yeah. I was going to mention the size, but thought I would complicate just things to get the core idea.
Mehrdad Afshari
I'm totally anal ... so I couldn't resist. ... the assignment operator in the title is also driving me bananas ... but I'm not going going to be that big of a knob. ;-)
John MacIntyre
Sorry that the "assignment operator" is driving you nuts, however I'm asking about mathematical equivalency not representing a code snippet so the equals sign is correct. Thanks for the answers!
Dinah
John, no the sizeof isn't needed. it's automatically incremented by the sizeof
Johannes Schaub - litb
@litb: He means it from a low level standpoint
Mehrdad Afshari
Why is sizeof() taken into account. I thought the pointer to 'a' is to the beginning of the array (ie: the 0 element). If this is true, you only need *(a + 5). My understanding must be incorrect. What's the correct reason?
Dinah
If you have an array of 4 byte integers, a[1] - a[0] = 4 (4 bytes dieffernce between the two pointers).
Treb
Dinah, no your understanding is correct. only *(a + 5) is needed. the compiler translates it into assembler code that offsets the address of a's first element by (5 * sizeof a[0]) bytes.
Johannes Schaub - litb
@Dinah: From a C-compiler perspective, you are right. No sizeof is needed and those expressions I mentioned are THE SAME. However, the compiler will take sizeof into account when producing machine code. If a is an int array, a[5] will compile to sth like mov eax, [ebx+20] instead of [ebx+5]
Mehrdad Afshari
Johannes Schaub - litb
@Dinah: A is an address, say 0x1230. If a was in 32-bit int array, then a[0] is at 0x1230, a[1] is at 0x1234, a[2] at 0x1238...a[5] at x1244 etc. If we just add 5 to 0x1230, we get 0x1235, which is wrong.
James Curran
The funny thing about SO is that this answer is more upvoted than this one: http://stackoverflow.com/questions/381171/help-me-understand-this-javascript-exploit#381205 !
Mehrdad Afshari
Mehrdad. i think that is because we comment on this all the time. so it will be pushed up in the list of recent discussions and then ppl will vote :) so it's always a good idea if you want more points to comment on yourself, then remove the comment haha
Johannes Schaub - litb
@litb: Nice loophole. The problem is my 200 daily rep was full before I answered them. No rep from upvotes of either of them! It's fun though ;)
Mehrdad Afshari
@James: bingo. That's what I needed to see. I kept seeing sizeof() and thinking count() and getting mightily confused. Not my brightest moment. Thank you!
Dinah
@Dinah; the assignment operator comment was just a tongue-in-cheek comment about how anal I am. ;-) ... I knew what you meant, and I'm sure everybody else did as well. Great question btw, I was just listening to the SO podcast where they were talking about it.
John MacIntyre
So in the 5[a] case, the compiler is smart enough to use "*((5 * sizeof(a)) + a)" and not "*(5 + (a * sizeof(5)))"? Note: I guess so. I tried this in GCC and it worked.
Harvey
@sr105: That's a special case for the + operator, where one of the operands is a pointer and the other an integer. The standard says that the result will be of the type of the pointer. The compiler /has to be/ smart enough.
aib
Mehrdad, i think the comment thing don't work anymore. if one comments, it's not floating to the top :/
Johannes Schaub - litb
comments never floated to the top in my memory
johnc
When you add an integer to a pointer, the compiler knows what type the pointer points to (so if a is an int*, it's 4 bytes or whatever...) so can perform the arithmetic right.Basically if you do "p++" then p should be adjusted to point to the next object in memory. "p++" is basically equivalent to "p = p + 1", so the definition of pointer addition makes everything line up.Also note you can't do arithmetic with pointers of type `void*`.
araqnid
I would so much like to upvote it, but the "This is the direct artifact of arrays being pointers" disturbs the otherwise so good answer :( I suspect if you said "... of arrays being converted to pointers", more people including me would upvote.
Johannes Schaub - litb
@litb: I understand your concern and potentially "misleading" people. However, I wanted to keep simplicity of the answer, as in this context, the array decays to a pointer. I changed "being a pointer" to "behaving as pointers." I hope that's OK. Thanks for the comment, btw.
Mehrdad Afshari
Great, thanks :)
Johannes Schaub - litb
http://freeworld.thc.org/root/phun/unmaintain.html mentions this as a good tactic for obfuscation, giving the example `myfunc(6291, 8)[Array];` where `myfunc` is simply the modulo function (that's equivalent to `Array[3]`)
fahadsadah
+53  A: 

Because arrays are defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.

David Thornley
The best answer here :)
gramm
Excellent answer!
jdecuyper
+45  A: 

And, of course

 "ABCD"[2] == 2["ABCD"] == 'C'

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)

James Curran
Actually, that evaluates to false; the first term "ABCD"[2] == 2["ABCD"] evaluates to true, or 1, and 1 != 'C' :D
Jonathan Leffler
@Jonathan: same ambiguity lead to the editing of the original title of this post. Are we the equal marks mathematical equivalency, code syntax, or pseudo-code. I argue mathematical equivalency but since we're talking about code, we can't escape that we're viewing everything in terms of code syntax.
Dinah
Isn't this a myth? I mean that the += and ++ operators were created to simplify for the compiler? Some code gets clearer with them, and it is useful syntax to have, no matter what the compiler does with it.
Thomas Padron-McCarthy
+= and ++ has another significant benefit. if the left hand side changes some variable while evaluated, the change will only done once. a = a + ...; will do it twice.
Johannes Schaub - litb
Heard that += reduces the odds for mistakes as you write variable names two times rather than three...
Liran Orevi
a = a + with objects often leads to unoptimized copies of the objects, because it has to make a copy of a. a += does not need a copy, it is evaluated directly.
Hooked
doesn’t "ABCD"[2] resolve to "CD"? if you want it to resolve to 'C' you’d have to use dereferencing, i.e. `*("ABCD"[2]) == 'C')`
knittl
No - "ABCD"[2] == *("ABCD" + 2) = *("CD") = 'C'. Dereferencing a string gives you a char, not a substring
MSalters
ah, yes. you’re right. i missed the part that `[x]` already dereferences.
knittl
+10  A: 

Dinah Why is sizeof() taken into account. I thought the pointer to 'a' is to the beginning of the array (ie: the 0 element). If this is true, you only need *(a + 5). My understanding must be incorrect. What's the correct reason?

In pointer arithmetic, the size of the item pointed to by the pointer is accounted for. So

char *pch = 0;
pch++;
printf("%p\n", pch);

double *pdbl = 0;
pdbl++;
printf("%p\n", pdbl);

(on my machine) will print

1
8

It's the reason we can subtract two pointers and get the count of items between them rather than the number of bytes between them. It prevents us from having to put sizeof(T) everywhere in our code.

In a lot of ways, you can think of pointer arithmetic as array arithmetic. But I probably shouldn't have said that. :-)

Frank Krueger
I sometimes believe that the H at the end of my name is invisible. Even family members often omit it.
Dinah
My apologies, I wrote that pretty quickly!
Frank Krueger
No worries. Seriously, I promise you I'll get a Christmas card this year to "Dina." Gracias for the edit.
Dinah
+7  A: 

Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.

PolyThinker
No, technically they are not the same. If you define some b as int*const and make it point to an array, it is still a pointer, meaning that in the symbol table, b refers to a memory location that stores an address, which in turn points to where the array is.
PolyThinker
+8  A: 

One thing no-one seems to have mentioned about Dinah's problem with sizeof:

You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.

There's a fairly exhaustive conversation about this in the comments of the accepted answer. I referenced said conversation in the edit to the original question but did not directly address your very valid concern of sizeof. Not sure how to best do this in SO. Should I make another edit to the orig. question?
Dinah