views:

441

answers:

8

What's the advantage of using malloc (besides the NULL return on failure) over static arrays? The following program will eat up all my ram and start filling swap only if the loops are uncommented. It does not crash.

...

#include

unsigned int bigint[ 1 << 29 - 1 ];
unsigned char bigchar[ 1 << 31 - 1 ];

int main (int argc, char **argv) {
  int i;
/*   for (i = 0; i < 1 << 29 - 1; i++) bigint[i] = i; */
/*   for (i = 0; i < 1 << 31 - 1; i++) bigchar[i] = i & 0xFF; */

  getchar();
  return 0;
}

...

After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit? Apparently I can have as many of of them as I want. It will segfault, but only if I ask for (and try to use) more than malloc would give me anyway.

Is there a way to determine if a static array was actually allocated and safe to use?

EDIT: I'm interested in why malloc is used to manage the heap instead of letting the virtual memory system handle it. Apparently I can size an array to many times the size I think I'll need and the virtual memory system will only keep in ram what is needed. If I never write to e.g. the end (or beginning) of these huge arrays then the program doesn't use the physical memory. Furthermore, if I can write to every location then what does malloc do besides increment a pointer in the heap or search around previous allocations in the same process?

+5  A: 

This is called custom memory management, I guess. You can do that, but you'll have to manage that chunk of memory yourself. You'd wind up writing your own malloc() woring over this chunk.

alamar
+10  A: 

with malloc you can grow and shrink your array: it becomes dynamic, so you can allocate exactly for what you need.

dfa
+2  A: 

Regarding:

After some trial and error I found the above is the largest static array allowed on my 32-bit Intel machine with GCC 4.3. Is this a standard limit, a compiler limit, or a machine limit?

One upper bound will depend on how the 4GB (32-bit) virtual address space is partitioned between user-space and kernel-space. For Linux, I believe the most common partitioning scheme has a 3 GB range of addresses for user-space and a 1 GB range of addresses for kernel-space. The partitioning is configurable at kernel build-time, 2GB/2GB and 1GB/3GB splits are also in use. When the executable is loaded, virtual address space must be allocated for every object regardless of whether real memory is allocated to back it up.

Lance Richardson
+2  A: 

You may be able to allocate that gigantic array in one context, but not others. For example, if your array is a member of a struct and you wish to pass the struct around. Some environments have a 32K limit on struct size.

As previously mentioned, you can also resize your memory to use exactly what you need. It's important in performance-critical contexts to not be paging out to virtual memory if it can be avoided.

Timothy Place
+1  A: 

You really should avoid doing this unless you know what you're doing. Try to only request as much memory as you need. Even if it's not being used or getting in the way of other programs it can mess up the process its self. There are two reasons for this. First, on certain systems, particularly 32bit ones it can cause address space to be exhausted prematurely in rare circumstances. Additionally many kernels have some kind of per process limit on reserved/virtual/not in use memory. If your program asks for memory at points in run time the kernel can kill the process if it asks for memory to be reserved that exceeds this limit. I've seen programs that have either crashed or exited due to a failed malloc because they are reserving GBs of memory while only using a few MB.

+2  A: 

There is no way to free stack allocation other than going out of scope. So when you actually use global allocation and VM has to alloc you real hard memory, it is allocated and will stay there until your program runs out. This means that any process will only grow in it's virtual memory use (functions have local stack allocations and those will be "freed").

You cannot "keep" the stack memory once it goes out of scope of function, it is always freed. So you must know how much memory you will use at compile time.

Which then boils down to how many int foo[1<<29]'s you can have. Since first one takes up whole memory (on 32bit) and will be (lets lie: 0x000000) the second will resolve to 0xffffffff or thereaobout. Then the third one would resolve to what? Something that 32bit pointers cannot express. (remember that stack reservations are resolved partially at compiletime, partially runtime, via offsets, how far the stack offset is pushed when you alloc this or that variable).

So the answer is pretty much that once you have int foo [1<<29] you cant have any reasonable depth of functions with other local stack variables anymore.

Pasi Savolainen
I had to read it a couple of times but you make a good point about the range of a 32-bit pointer. I found I could fill two 1<<31-1 sized arrays without a segfault, and that's the a full enumeration of all possible pointers minus the NULL. That means I was writing to 0x4,0x8 and other places where it normally segfaults. I wonder what happened to the kernel in this experiment.
Samuel Danielson
+5  A: 

Well, for two reasons really:

  1. Because of portability, since some systems won't do the virtual memory management for you.

  2. You'll inevitably need to divide this array into smaller chunks for it to be useful, then to keep track of all the chunks, then eventually as you start "freeing" some of the chunks of the array you no longer require you'll hit the problem of memory fragmentation.

All in all you'll end up implementing a lot of memory management functionality (actually pretty much reimplementing the malloc) without the benefit of portability.

Hence the reasons:

  • Code portability via memory management encapsulation and standardisation.

  • Personal productivity enhancement by the way of code re-use.

Totophil