tags:

views:

1400

answers:

8

I have been writing C for only a scant few weeks and have not taken the time to worry myself too much about malloc(). Recently, though, a program of mine returned a string of happy faces instead of the true/false values I had expected to it.

If I create a struct like this:

typedef struct Cell {
  struct Cell* subcells;
}

and then later initialize it like this

Cell makeCell(int dim) {
  Cell newCell;

  for(int i = 0; i < dim; i++) {
    newCell.subcells[i] = makeCell(dim -1);
  }

  return newCell; //ha ha ha, this is here in my program don't worry!
}

Am I going to end up accessing happy faces stored in memory somewhere, or perhaps writing over previously existing cells, or what? My question is, how does C allocate memory when I haven't actually malloc()ed the appropriate amount of memory? What's the default?

+4  A: 

Anything not allocated on the heap (via malloc and similar calls) is allocated on the stack, instead. Because of that, anything created in a particular function without being malloc'd will be destroyed when the function ends. That includes objects returned; when the stack is unwound after a function call the returned object is copied to space set aside for it on the stack by the caller function.

Warning: If you want to return an object that has pointers to other objects in it, make sure that the objects pointed to are created on the heap, and better yet, create that object on the heap, too, unless it's not intended to survive the function in which it is created.

Chris Charabaruk
+16  A: 

There is no default value for your pointer. Your pointer will point to whatever it stores currently. As you haven't initialized it, the line

newCell.subcells[i] = ...

Effectively accesses some uncertain part of memory. Remember that subcells[i] is equivalent to

*(newCell.subcells + i)

If the left side contains some garbage, you will end up adding i to a garbage value and access the memory at that uncertain location. As you correctly said, you will have to initialize the pointer to point to some valid memory area:

newCell.subcells = malloc(bytecount)

After which line you can access that many bytes. With regards to other sources of memory, there are different kind of storage that all have their uses. What kind you get depends on what kind of object you have and which storage class you tell the compiler to use.

  • malloc returns a pointer to an object with no type. You can make a pointer point to that region of memory, and the type of the object will effectively become the type of the pointed to object type. The memory is not initialized to any value and access usually is slower. Objects so obtained are called allocated objects.
  • You can place objects globally. Their memory will be initialized to zero. For points, you will get NULL pointers, for floats you will get a proper zero too. You can rely on a proper initial value.
  • If you have local variables but use the static storage class specifier, then you will have the same initial value rule as for global objects. The memory usually is allocated the same way like global objects, but that's in no way a necessity.
  • If you have local variables without any storage class specifier or with auto, then your variable will be allocated on the stack (even though not defined so by C, this is what compilers do practically of course). You can take its address in which case the compiler will have to omit optimizations like putting it into registers of course.
  • Local variables used with the storage class specifier register, are marked as having a special storage. As a result, you cannot take its address anymore. In recent compilers, there is normally no need to use register anymore, because of their sophisticated optimizers. If you are really expert, then you may get some performance out of it if using it, though.

Objects have associated storage durations that can be used to show the different initialization rules (formally, they only define how long at least the objects live). Objects declared with auto and register have automatic storage duration and are not initialized. You have to explicitly initialize them if you want them to contain some value. If you do not, they will contain whatever the compiler left on the stack before they began lifetime. Objects that are allocated by malloc (or another function of that family, like calloc) have allocated storage duration. Their storage is not initialized either. An exception is when using calloc, in which case the memory is initialized to zero ("real" zero. i.e all bytes 0x00, without regard to any NULL pointer representation). Objects that are declared with static and global variables have static storage duration. Their storage is initialized to zero appropriate for their respective type. Note that an object must not have a type, but the only way to get a type-less object is using allocated storage. (An object in C is a "region of storage").

So what is what? Here is the fixed code. Because once you allocated a block of memory you can't get back anymore how many items you allocated, best is to always store that count somewhere. I've introduced a variale dim to the struct that gets the count stored.

Cell makeCell(int dim) {
  /* automatic storage duration => need to init manually */
  Cell newCell;

  /* note that in case dim is zero, we can either get NULL or a 
   * unique non-null value back from malloc. This depends on the
   * implementation. */
  newCell.subcells = malloc(dim * sizeof(*newCell.subcells));
  newCell.dim = dim;

  /* the following can be used as a check for an out-of-memory 
   * situation:
   * if(newCell.subcells == NULL && dim > 0) ... */
  for(int i = 0; i < dim; i++) {
    newCell.subcells[i] = makeCell(dim - 1);
  }

  return newCell;
}

Now, things look like this for dim=2:

Cell { 
  subcells => { 
    Cell { 
      subcells => { 
        Cell { subcells => {}, dim = 0 }
      }, 
      dim = 1
    },
    Cell { 
      subcells => { 
        Cell { subcells => {}, dim = 0 }
      }, 
      dim = 1
    }
  },
  dim = 2
}

Note that in C, the return value of a function is not needed to be an object. No storage at all is required to exist. Consequently, you are not allowed to change it. For example, the following is not possible:

makeCells(0).dim++

You will need a "free function" that free's the allocated memory again. Because storage for allocated objects is not freed automatically. You have to call free to free that memory for every subcells pointer in your tree. It's left as an exercise for you to write that up :)

Johannes Schaub - litb
+25  A: 

Short answer: It isn't allocated for you.

Slightly longer answer: The subcells pointer is uninitialized and may point anywhere. This is a bug, and you should never allow it to happen.

Longer answer still: Automatic variables are allocated on the stack, global variables are allocated by the compiler and often occupy a special segment or may be in the heap. Global variables are initialized to zero by default. Automatic variables do not have a default value (they simply get the value found in memory) and the programmer is responsible for making sure they have good starting values (though many compilers will try to clue you in when you forget).

The newCell variable in you function is automatic, and is not initialized. You should fix that pronto. Either give newCell.subcells a meaningful value promptly, or point it at NULL until you allocate some space for it. That way you'll throw a segmentation violation if you try to dereference it before allocating some memory for it.

Worse still, you are returning a Cell by value, but assigning it to a Cell * when you try to fill the subcells array. Either return a pointer to a heap allocated object, or assign the value to a locally allocated object.

A usual idiom for this would have the form something like

Cell* makeCell(dim){
  Cell *newCell = malloc(sizeof(Cell));
  // error checking here
  newCell->subcells = malloc(sizeof(Cell*)*dim); // what if dim=0?
  // more error checking
  for (int i=0; i<dim; ++i){
    newCell->subCells[i] = makeCell(dim-1);
    // what error checking do you need here? 
    // depends on your other error checking...
  }
  return newCell;
}

though I've left you a few problems to hammer out..

And note that you have to keep track of all the bits of memory that will eventually need to be deallocated...

dmckee
Global variables are initialized - to zeroes. You omitted to mention file static and function static variables; for the purposes of the exercise, they're more like globals (initialized to zeroes) than automatic variables.
Jonathan Leffler
dmckee
I don't see how you think this function is returning a pointer to an auto variable — the function appears to return a Cell by value. (The rest of your critique is on the money, of course.)
Chuck
@Chuck: structs are *always* returned by reference, aren't they? Or does that depend on the size of the object?
dmckee
Err...it looks like I may be wrong about that...
dmckee
I find claims struct assignment (and thus return by value) has been supported since c90, and claims that many compiler didn't support into the mid 1990s, which may be what I recalled. Fixing that too...
dmckee
I suspect that if you hadn't adopted a recursive definition (which carries its own issues, by the way), you wouldn't have this problem.
plinth
@plinth: Well, the Ziggy's code is not very enlightening about the intended use of this structure (what with the lack of cargo), so I just followed his approach for now. It'll give him something to work on.
dmckee
A: 

Local variables are "allocated" on the stack. The stack is a preallocated amount of memory to hold those local variables. The variables cease to be valid when the function exits and will be overwritten by whatever comes next.

In your case, the code is doing nothing since it doesn't return your result. Also, a pointer to an object on the stack will also cease to be valid when the scope exits, so I guess in your precise case (you seems to be doing a linked list), you will need to use malloc().

Coincoin
+3  A: 

My question is, how does C allocate memory when I haven't actually malloc()ed the appropriate amount of memory? What's the default?

To not allocate memory. You have to explicity create it on the stack or dynamically.

In your example, subcells points to an undefined location, which is a bug. Your function should return a pointer to a Cell struct at some point.

Bernard
A: 

Am I going to end up accessing happy faces stored in memory somewhere, or perhaps writing over previously existing cells, or what?

You are lucky that you got a happy face. On one of those unlucky days, it could've wiped your system clean ;)

My question is, how does C allocate memory when I haven't actually malloc()ed the appropriate amount of memory?

It doesn't. However, what happens is when you define you Cell newCell, the subCells pointer is initialized to garbage value. Which may be a 0 (in which case you'd get a crash) or some integer big enough to make it look like an actual memory address. The compiler, on such cases, would happily fetch whatever value is residing there and bring it back to you.

What's the default?

This is the behavior if you don't initialize your variables. And your makeCell function looks a little under-developed.

dirkgently
A: 

There are really three sections where things can be allocated - data, stack & heap.

In the case you mention, it would be allocated on the stack. The problem with allocating something on the stack is that it's only valid for the duration of the function. Once your function returns, that memory is reclaimed. So, if you return a pointer to something allocated on the stack, that pointer will be invalid. If you return the actual object though (not a pointer), a copy of the object will automatically be made for the calling function to use.

If you had declared it as a global variable (e.g. in a header file or outside of a function) it would be allocated in the data section of memory. The memory in this section is allocated automatically when your program starts and deallocated automatically when it finishes.

If you allocate something on the heap using malloc(), that memory is good for as long as you want to use it - until you call free() at which point it is released. This gives you the flexibility to allocate and deallocate memory as you need it (as opposed to using globals where everything is allocated up front and only released when your program terminates).

Eric Petroelje
A: 

I'm going to pretend I'm the computer here, reading this code...

typedef struct Cell {
  struct Cell* subcells;
}

This tells me:

  • We have a struct type called Cell
  • It contains a pointer called subcells
  • The pointer should be to something of type struct Cell

It doesn't tell me whether the pointer goes to one Cell or an array of Cell. When a new Cell is made, the value of that pointer is undefined until a value is assigned to it. It's Bad News to use pointers before defining them.

Cell makeCell(int dim) {
  Cell newCell;

New Cell struct, with an undefined subcells pointer. All this does is reserve a little chunk of memory to be called newCell that is the size of a Cell struct. It doesn't change the values that were in that memory - they could be anything.

  for(int i = 0; i < dim; i++) {
    newCell.subcells[i] = makeCell(dim -1);

In order to get newCell.subcells[i], a calculation is made to offset from subcells by i, then that is dereferenced. Specifically, this means the value is pulled from that memory address. Take, for instance, i==0... Then we would be dereferencing the subcells pointer itself (no offset). Since subcells is undefined, it could be anything. Literally anything! So, this would ask for a value from somewhere completely random in memory. There's no guarantee of anything with the result. It may print something, it may crash. It definitely should not be done.

  }

  return newCell;
}

Any time you work with a pointer, it's important to make sure it's set to a value before you dereference it. Encourage your compiler to give you any warnings it can, many modern compilers can catch this sort of thing. You can also give pointers cutesy default values like 0xdeadbeef (yup! that's a number in hexadecimal, it's just also a word, so it looks funny) so that they stand out. (The %p option for printf is helpful for displaying pointers, as a crude form of debugging. Debugger programs also can show them quite well.)

Kim Reece