views:

106

answers:

4

I write a 'constructor' function that makes a Node in C, compiled with Visual Studio 2008, ANSI C mode.

#include <stdio.h>
#include <stdlib.h>

typedef struct _node
{
  struct _node* next ;
  char* data ;
} Node ;

Node * makeNode()
{
  Node * newNode = (Node*)malloc( sizeof(Node) ) ;

  // uncommenting this causes the program to fail.
  //puts( "I DIDN'T RETURN ANYTHING!!" ) ;
}

int main()
{
  Node * myNode = makeNode() ;
  myNode->data = "Hello there" ;

  // elaborate program, still works

  puts( myNode->data ) ;

  return 0 ;
}

What's surprising to me :

  • * Not returning a value from makeNode() is only a warning,
  • * More surprising is makeNode() __still works__ as long as I don't puts() anything!

What's going on here and is it "ok" to do this (not return the object you create in a C 'constructor' function?)

WHY is it still working? Why does the puts() command cause the program to fail?

+3  A: 

It probably has to do with the last thing on the stack. Without puts(), the last thing on the stack is the node you allocated, and it gets returned. With puts(), the last thing on the stack is the return value of puts(), which is an int, and is returned and used as a pointer, which is probably bad.

Note that, either way, your program is wrong. This is undefined behavior (or some siliarly scary sounding standardese) and shouldn't be relied upon. Make your function work right all the time - don't rely on undefined behavior.

If you want to find out if this is true, you can do this:

Node * makeNode()
{
  Node * newNode;

  puts( "I DIDN'T RETURN ANYTHING!!" ) ;

  newNode = (Node*)malloc( sizeof(Node) ) ;
}

Will probably work the same as the one that doesn't puts(). This will also probably work:

Node * makeNode()
{
  malloc( sizeof(Node) ) ;
}

But you really shouldn't be using any of these. They work by happenstance, and if you were really lucky none of them would work.

Also note that some people (including me) consider the typecasting of the return value of malloc() to be a bad idea, but that is a debatable subject.

Chris Lutz
But I thought malloc() came from the heap..!
bobobobo
Yeah, but the return value is on the stack. And the return value is the pointer, which is on the stack. It _points_ to data on the heap, but the _pointer_ is still passed around on the stack.
Chris Lutz
On x86, simple values like integers and pointers are returned in the `eax` register.
Bastien Léonard
@Bastien - Assuming he's on x86, which the OP hasn't said anything about. It's highly likely, but the same error can plausibly happen on any architecture that uses the same scheme for value passing and returning.
Chris Lutz
The position of `newNode` on the stack would be *above* the return address (to get back to `main`). The real case is probably that `malloc` returned its pointer through `eax`, which was unaltered by the lack of a `return` statement in the function.
dreamlax
@myself: assuming x86 of course :)
dreamlax
he did say it's on visual studio.
rlbond
+6  A: 

The reason that not returning anything is a warning and not an error is probably largely historical. In 'traditional' C, functions didn't need to declare their return type which just defaulted to int. Some functions were written with no explicit return type and didn't return anything, others chose to return something meaningful but still didn't declare a return type. Trying to tighten up return statements or lack thereof would have meant breaking a lot of old-style code.

It might happen to work, but what you are seeing is dependent on things which aren't guaranteed.

What's probably happening is that the return value of a function goes into a particular register. After you call malloc, if you do nothing else and fall of the end of the function, what is returned by malloc appears to be returned by your function as the result is still sitting in the return register after that function call.

If you call some other function, the return value of malloc is lost and what is returned by your function is whatever happened to end up in the return register.

Charles Bailey
That's exactly what's happening, in MS VC++ return values are always placed in eax. So if you call a function and immediately return from your function, eax won't be touched so it will still be set when your function returns. In MS VC++ the call you have will consistently work. I've seen that exact problem in production code myself.
shf301
@shf301: The reason that I was being vague is that it's an implementation detail and the question is a C question. Of course, return values don't always fit in `eax`. In this example, the full width of `rax` might be needed (VS2008 supports x64 as well). In other situations (floating point) `xmm0` (or `st(0)`) might be used or for large values the caller would need to allocate space for the return value and use `rax` or a stack parameter to tell the callee where to write the return value.
Charles Bailey
+4  A: 

In your case, and almost always on x86, the return value is passed via the register EAX.

When you comment out puts(), no code intervenes between the assignment to newNode and the return of your function. This results in pseudo-assembly code such as:

push sizeof(Node)
call malloc              ; allocate memory, places return value in eax
move [newNode], eax      ; make use of return value from malloc
ret                      ; return from your function -- 
                         ; eax still contains malloc() return value

Therefore eax register is not changed and your function appears to return the pointer.

Compare with:

push sizeof(Node)
call malloc              ; allocate memory, places return value in eax
move [newNode], eax      ; make use of return value from malloc
push DWORD PTR ["I DIDN'T RETURN ANYTHING!!"]
call puts                ; during this subroutine, eax will be changed
ret                      ; return from your function -- 
                         ; eax contains garbage value left over from puts()

Heath Hunnicutt
+1  A: 

My guess is that makeNode() stores the address of the allocated object in eax, which happens to be the register used to return pointers on x86.

And calling puts() probably happens to modify eax's value.

If you show the disassembly, we can tell you what really happens.

Bastien Léonard