views:

284

answers:

6

If we can use pointers and malloc to create and use arrays, why does the array type exist in C?

Isn't it unnecessary if we can use pointers instead?

Thanks.

+6  A: 

Arrays are faster than dynamic memory allocation.

Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.

Also, C does not mandate that malloc() and friends are available in free-standing implementations.


Edit

Example of array

#define DECK_SIZE 52
int main(void) {
    int deck[DECK_SIZE];
    play(deck, DECK_SIZE);
    return 0;
}

Example of malloc()

int main(void) {
    size_t len = 52;
    int *deck = malloc(len * sizeof *deck);
    if (deck) {
        play(deck, len);
    }
    free(deck);
    return 0;
}

In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.

Arrays can never change size, malloc'd memory can grow when needed.

If you only need a fixed number of elements, use an array (within the limits of your implementation). If you need memory that can grow or shrink during the running of the program, use malloc() and friends.

pmg
The allocation of both happens at run-time. The difference is that C allows you to provide the size at run-time for one but not the other.
AraK
Can u give me more details what happen in compile time and run time when an array is located? Even an example would be helpful. thanks.
tsubasa
@AraK, I think by allocation at run-time he means that the software has to request the memory from the OS (with the call to `malloc`), as opposed to just adjusting the stack-pointer (and thus "allocated" at "compile time").
mrduclaw
+1  A: 

Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.

They exist because some requirements require them.

In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?

If I have to define the names of the operations anyway, why not put them into an array?

C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.

An array is a shorthand way to specify pointing to the beginning of a malloc for example.

But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.

James Black
+5  A: 

It's not a bad question. In fact, early C had no array types.

Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.

Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.

You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.

EDIT, since tsubasa has asked for more details on how this allocation occurs:

On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)

In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.

Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.

Artelius
"worry about accidentally freeing something twice" - deleting a null pointer is a no-op as defined by the standard, i.e., it is safe. Now, the fact that you are calling delete on a null pointer probably indicates another problem, but the call itself is safe.
Ed Swangren
@Ed Swangren: while a common coding practice is to set pointer variables to NULL after freeing them: A) not everyone does this, and B) in data structures which can have multiple pointers pointing to the same thing, this can't prevent the problem
Artelius
multi dimensional arrays seem to be hard to impossible to do with something like `_alloca` though, without some overhead that C apparently wasn't be ready to take. Also `sizeof` wouldn't be quite as easy to implement to still yield a compile time result (the compiler would basically have to back-track the pointer to see what initializer was used for that pointer, and then decide whether to yield sizeof of the pointer, or the "array")
Johannes Schaub - litb
@litb: multidimensional arrays are not hard to do, some extra burden of work goes onto the programmer (i.e. keeping track of the array dimensions themselves) but there is no performance penalty. See my comment to your answer.
Artelius
A: 

Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?

omouse
In C, arrays and pointers both use the same method of working out how many bytes to move by.
Artelius
@Artelius, Not at all. Arrays use their type to know what offset to use upon indexing and their own address to know the base address to start, while pointers use the address stored in them to know the base address, and they don't know about offsets (you have to tell them manually). +1 :)
Johannes Schaub - litb
@litb: What are you talking about? Arrays and pointers *both* use their type signature to know what offset from the base address to use (though you're right that the source of this base address is different).
Artelius
Well, yes, but when you have a multi-dim array `int[2][2]`, then advancing by `sizeof(int)` is wrong when you do `a[1]`, you have to advance by `sizeof(int) * 2` :)
Johannes Schaub - litb
IMO, the phrase "using the wrong byte size" implies that omouse believes that pointer arithmetic is always done in bytes (a misconception among many C beginners - I thought this was the case myself at one point). Multi-dim arrays are an issue but I don't think they were the issue omouse had in mind. If he had something else in mind he is welcome to make it clear.
Artelius
+1  A: 

See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.

You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.

Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).

Tim Post
A: 

Explanation by Dennis Ritchie about C history:

Embryonic C

NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by

int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];

The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.

Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.

These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as

struct {
    int   inumber;
    char  name[14];
};

I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?

The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Johannes Schaub - litb