views:

425

answers:

11

I need a language lawyer with authoritative sources.

Take a look at the following test program which compiles cleanly under gcc:

#include <stdio.h>


void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

int main(int argc, char **argv) {

int a[100], b[500], *a_p;

*(a+99) = 0xDEADBEEF;
*(b+499) = *(a+99);

foo(a);
bar(b);

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);
printf("b[498] == %X\nb[499] == %X\n", b[498], b[499]);

a_p = a+98;
*a_p = 0xDEADFACE;

printf("a[98] == %X\na[99] == %X\n", a[98], a[99]);

}

It produces the output I expect:

anon@anon:~/study/test_code$ gcc arrayType.c -o arrayType
anon@anon:~/study/test_code$ ./arrayType 
a[98] == FEADFACE
a[99] == DEADBEEF
b[498] == FEADFACE
b[499] == DEADBEEF
a[98] == DEADFACE
a[99] == DEADBEEF

Are a and b the same type? Is int *a handled as the same type as int a[] internally in the compiler?

From a practical point of view int a[100], b[500], *a_p, b_a[]; all seem to be the same type. It's hard for me to believe that the compiler is constantly adjusting these types in the various circumstances in my above example. I'm happy to be proven wrong.

Can someone settle this question for me definitively and in detail ?

A: 

Are a and b the same type?

Yes. [Edit: I should clarify: The parameter a of function foo is the same type as the parameter b to function bar. Both are pointers to int. The local variable a in main is the same type as the local variable b in int. Both are arrays of ints (well actually they're not the same type because they don't have the same size. But both are arrays).]

Is int *a handled as the same type as int a[] internally in the compiler?

Usually not. The exception is when you write foo bar[] as a parameter to a function (like you did here), it automatically becomes foo *bar.

When declaring non-parameter variables however there is a big difference.

int * a; /* pointer to int. points nowhere in paticular right now */
int b[10]; /* array of int. Memory for 10 ints has been allocated on the stack */
foo(a); /* calls foo with parameter `int*` */
foo(b); /* also calls foo with parameter `int*` because here the name b basically
           is a pointer to the first elment of the array */
sepp2k
Could you give details, sources? Why would a compiler not handle `int *a;` and `int a[];` the same? As far as I can tell they are *always* interchangeable;
Robert S. Barnes
sepp2k
`int b[10]` allocates space for 10 ints and creates a pointer to the array. `int b[]` would only create a pointer that points to nowhere, just like `int * a`.
awe
I understand all that. My point is that in practice `a[100], b[500], *a_p, b_a[];` are all treated as int pointers. Specifically, they are all just a memory location containing a memory address. Whatever notation is used to access the data you always get memory address + sizeof(int) * subscript then a dereference to get at the data. The only difference is that a and b are effectively const pointers since the block of memory they point to can't be changed. Other than that they're all the same.
Robert S. Barnes
Robert, this is C through and through. Even with structs, all you really get is a block of memory top play with.
BenB
+6  A: 

Are a and b the same type? Is int *a handled as the same type as int a[] internally in the compiler?

From the comp.lang.C FAQ:

... whenever an array appears in an expression, the compiler implicitly generates a pointer to the array's first element, just as if the programmer had written &a[0]. (The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal initializer for a character array...)

... Given an array a and pointer p, an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different ...

Given declarations of

char a[] = "hello";
char *p = "world";

... when the compiler sees the expression a[3], it emits code to start at the location a, move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of) the object named a, while p[3] is three places past the object pointed to by p.

Emphasis is mine. The biggest difference seems to be that the pointer is fetched when it's a pointer, while there is no pointer to fetch if it's an array.

Mark Rushakoff
This quote from your link says it most concisely: "it's pointer arithmetic and array indexing [that] are equivalent in C, pointers and arrays are different."
Robert S. Barnes
It seems like arrays are more efficient as they save a memory fetch over pointers. It seems that `a` is itself the address of 'h' whereas `p` is the address of the location containing the address of 'w'. That's the distinction I was trying to understand. The array `a` in that sense is actually closer to a declaration like `int a;` than to `int *a;` Now it all makes sense.
Robert S. Barnes
+2  A: 

I agree with sepp2k's answer and Mark Rushakoff's comp.lang.c FAQ quote. Let me add some important differences between the two declarations and a common trap.

  1. When you define a as an array (in a context other than a function's argument, which is a special case) you can't write a = 0; or a++; because a is not an lvalue (a value that can appear on the left of an assignment operator).

  2. The array definition reserves space, whereas the pointer doesn't. Therefore, sizeof(array) will return the memory space needed for storing all the array's elements (for instance 10 times four bytes for an array of 10 integers on a 32-bit architecture), whereas sizeof(pointer) will only return the memory space required for storing that pointer (for instance 8 bytes in a 64-bit architecture).

  3. When you prepend pointer or append array declarations things definitely diverge. For instance, int **a is a pointer to a pointer to an integer. It can be used as a two-dimensional array (with rows of varying sizes) by allocating an array of pointers to the rows and making each one point to memory for storing integers. To access a[2][3] the compiler will fetch the pointer in a[2] and then move three elements past the location it points to in order to access the value. Contrast this with b[10][20] which is an array of 10 elements, each of which is an array of 20 integers. To access b[2][3] the compiler will offset the beginning of the array's memory area by multiplying 2 by the size of 20 integers and adding the size of 3 more integers.

Finally, consider this trap. If you have in one C file

int a[10];

and in another

extern int *a;
a[0] = 42;

the files will compile and link without an error, but the code will not do what you might expect; it will probably crash with a null pointer assignment. The reason is that in the second file a is a pointer whose value is the contents of the first file's a[0], i.e. initially 0.

Diomidis Spinellis
A: 

No, they are not the same! One is a pointer to an int, the other is an array of 100 ints. So yes, they are the same!

OK, I'll try to explain this stupidity.

*a and a[100] are basically the same for what you are doing. But if we look in detail at the memory handling logic for the compiler, what we are saying is:

  • *a compiler, I need memory, but I'll tell you how much later, so chill for now!
  • a[100] compiler, I need memory now, and I know I need 100, so make sure we have it!

Both are pointers. And your code can treat them the same and trample the memory near those pointers all you want. But, a[100] is continuous memory from the pointer allocated at compile time while *a only allocates the pointer because it doesnt know when you are going to need the memory (run time memory nightmares).

So, Who Cares, right? Well, certain functions like sizeof() care. sizeof(a) will return a different answer for *a and for a[100]. And this will be different in the functions too. In this functions case, the compiler knows the difference, so you can use this to your advantage in your code too, for loops, memcpy, etc. Go on, try.

This is a huge question, but the answer I am giving here is this. The compiler knows the subtle difference, and it will produce code that will look the same most times, but different when it matters. It is up to you to find out what *a or a[100] means to the cimpiler and where it will treat it differently. They can be effectively the same, but they are not the same. And to make it worse, you can change the whole game by calling a function like you have.

Phew... Is it any wonder that managed code like c# is so hot right now?!

Edit: I should also add that you can do *a_p = X, but try to do that with one of your arrays! Arrays work with memory just like pointers, but they can't be moved or resized. Pointers like *a_p can point at different things.

BenB
They are not both pointers. In some contexts, such as passing them to a function expecting a pointer, both are treated as pointers.
Diomidis Spinellis
Agree. What did I say differently?
BenB
+3  A: 

One of the differences - int a[x][y] and int **a are not interchangeable.

http://www.lysator.liu.se/c/c-faq/c-2.html

2.10:

An array of arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer.

Adrian Panasiuk
+2  A: 

Look here:

2.2: But I heard that char a[] was identical to char *a.

http://www.lysator.liu.se/c/c-faq/c-2.html

+2  A: 

a and b are both arrays of ints. a[0] is not a memory location containing a memory address, it is a memory location containing an int.

Arrays and pointers are neither identical nor interchangeable. Arrays are equivalent to pointers iff when an lvalue of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T. This becomes clear when looking at the assembly output for related code. The three exceptions, fyi, are when the array is an operand of sizeof or & or a literal string initializer for a character array.

If you would picture this:

char a[] = "hello";
char *p = "world";

would result in data structures which could be represented like this:

   +---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
   +---+---+---+---+---+---+

   +-----+     +---+---+---+---+---+---+
p: |  *======> | w | o | r | l | d |\0 |
   +-----+     +---+---+---+---+---+---+

and realize that a reference like x[3] produces different code depending on whether x is a pointer or an array. a[3] for the compiler means: start at the location a and move three past it and fetch the char there. p[3] means go to the location p, dereference the value there, move three past it and fetch the char there.

Michael Foukarakis
A: 

I'll throw my hat into the ring for a simple explanation of this:

  • An array is a series of contiguous storage locations for the same type

  • A pointer is the address of a single storage location

  • Taking the address of an array gives the address of (i.e a pointer to) its first element.

  • Elements of an array can be accessed through a pointer to the array's first element. This works because the subscript operator [] is defined on pointers in a way designed to facilitate this.

  • An array can be passed where a pointer parameter is expected, and it will be automatically converted into a pointer-to-first-element (although this is not recursive for multiple levels of pointers, or multi-dimensional arrays). Again, this is by design.

So, in many cases, the same piece of code can operate on arrays and contiguous blocks of memory that were not allocated as an array because of the intentionally special relationship between an array and a pointer to its first element. However they are distinct types, and they do behave differently in some circumstances, e.g. pointer-to-array is not at all the same as pointer-to-pointer.

Here's a recent SO question that touches on the pointer-to-array versus pointer-to-pointer issue: http://stackoverflow.com/questions/1370749/whats-the-difference-between-abc-and-abc-in-c

Tyler McHenry
A: 

If you have a pointer to a character array (and want to get the size of that array), you cannot use sizeof(ptr) but instead have to use strlen(ptr)+1!

A: 

There are two a's and two b's in your example.

As parameters

void foo(int *a) {
    a[98] = 0xFEADFACE;
}

void bar(int b[]) {
    *(b+498) = 0xFEADFACE;
}

a and b are of the same type: pointer to int.

As variables

int *a;
int b[10];

aren't of the same time. The first is a pointer, the second is an array.

Array behavior

An array (a variable or not) is converted implicitly in most of the contexts in a pointer to its first element. The two contexts in C where it is not done are as argument of sizeof and argument of &; in C++ there are some more related to reference parameters and templates.

I wrote, a variable or not because the conversion is not done only for variables, some examples:

int foo[10][10];
int (*bar)[10];
  • foo is an array of 10 arrays of 10 ints. In most context it will be converted in a pointer to its first element, of type pointer to array of 10 int.

  • foo[10] is an array of 10 int; In most context it will be converted in a pointer to its first element, of type pointer to int.

  • *bar is an array of 10 int; In most context it will be converted in a pointer to its first element, of type pointer to int.

Some history

In B, the direct ancestor of C, the equivalent of

int x[10];

had the effect of what in current C we'd write

int _x[10];
int *x = &_x;

ie it allocated memory and initialized a pointer to it. Some people seem to have the misconception that it is still true in C.

In NB -- when C was no more B but not yet called C --, there was a time were a pointer was declared

int x[];

but

int foo[10];

would have the current meaning. The adjustment in function parameter is a remnant of that time.

AProgrammer
I believe `int (*bar)[10];` is pointer to an array of 10 ints and `int *bar[10];` is an array of 10 int pointer.
Robert S. Barnes
@Robert, right. But I wrote about *bar and not about bar. My goal was to show examples on the implicit conversion array->pointer to first element, not to explain C declaration syntax.
AProgrammer
+1  A: 
John Bode
+1 You and Mark both answered different aspects of the question. If I could accept both as the answer I would, but Marks answer got more to the root of problem I was having, i.e. what's the difference in the generated code when accessing an array vs. a pointer. Thanks!
Robert S. Barnes