tags:

views:

212

answers:

4

In the course of this discussion about casting the return value of malloc many people have claimed that the implicit declaration of malloc would cause the return value to be converted to int then reconverted back to T* possibly resulting in truncation of the pointer in situations where:

sizeof(int) < sizeof(void*)

This would imply that the compiler does the following:

  1. Links to and calls the correct object code defining malloc
  2. Generates object code to convert the return value to the shorter int type
  3. Generates object code to convert back to the larger destination pointer type

Could someone actually prove that this happens? Say with some example code on 64bit Linux?

I'd do it myself, but I don't have access to a 64 bit machine.

+1  A: 

Malloc is declared in stdlib.h file header and the declaration is included directly by the C preprocessor of your source, which is then linked with malloc code in later stages.

When you have code:

#include <stdlib.h>
...
void * foo = malloc(42);

it's actually proccessed into something like

...
extern void *malloc (size_t __size) __attribute__ ((__nothrow__)) __attribute__ ((__malloc__)) ;
(...lots of other declarations...)
...
void * foo = malloc(42);

When you don't include function prototype, it defaults to something like

int malloc();
...
void * foo = malloc(42);

Which means that the final compiled code will do something like "call malloc with argument 42, convert its return value from int to void* and put it to foo". Then this will get linked with libc that has pre-compiled object code of malloc, which is obviously void*-returning. Therefore, the result will be one extra int-to-void* conversion on CPU register that holds the return value. I imagine that on 64bit architecture it might mean taking lower 32 bits and putting 32 zeroes in front of then, thus clearing part of the original pointer.

che
Language lawyer nitpick: malloc is declared, not defined, in stdlib.h!
Thomas Padron-McCarthy
Thanks; I rephrased the answer, hopefully it's ok now.
che
+1  A: 

I think 2 is not quite a "concious" a conversion as you imply. When daling with a function whose return type is unknown the compiler must make some assumption about how many bytes to "grab". The default is the size of an int.

So if a void* and an int happen to be the same size, well and good, if not oops!

djna
+6  A: 

The problem with your description of what happens is in step 2. With an implicit declaration, the code at the calling site doesn't "convert" the return value of the function, really.

What happens is that the code at the calling site extracts the return value (typically from a register, or off the stack) by assuming that it's of type "int". The procedure to do this is different for different OSes and compilers, and is typically specified by an ABI document.

For the most common ABIs, the return location and sizes of int and void* are the same, so you actually won't have any problem doing this, even though it's incorrect. This is true for Linux, Windows, and Mac OS X on both 32- and 64-bit platforms, I believe 32-bit platforms.

On 64-bit platforms, it's more common for "long" and "void *" to be the same size, so if you have an implicit declaration for malloc(), the return value will be truncated. There are several popular 64-bit programming models, though.

Back in the "good old days" of DOS development, it was possible to create programs that ran in a mode where "int" was 16 bits, and pointers were 32 bits (actually, 24). In those cases, calling malloc() with an implicit prototype would truncate the returned value.

Note that even in the cases where the return value is truncated, you still might not have a runtime problem, depending on the whether the value is actually outside the valid range of an int.


On Mac OS X, in 64-bit mode, this code:

#include <stdio.h>

int main (int argc, const char * argv[]) {
    int x = malloc(128);
    void *p = malloc(128);
    printf("Hello, World!\nsizeof(int)=%d,sizeof(void*)=%d,x=0x%xd,p=%p\n", sizeof(int), sizeof(void *), x, p);
    return 0;
}

prints:

Hello, World! sizeof(int)=4,sizeof(void*)=8,x=0x1001c0d,p=0x100100240

Note that the "x" value has fewer digits than the "p" value, having silently dropped the most-significant 32 bits of the value. The actual assembly code at the two calls to malloc looks like this:

LM2:
    movl $128, %edi
    call _malloc
    movl %eax, -12(%rbp)
LM3:
    movl $128, %edi
    call _malloc
    movq %rax, -8(%rbp)

So, the right value is being returned by malloc (in %rax), but the movl instruction truncates it as it's being moved into variable "x".

Mark Bessey
At least amd64 Linux has sizeof(int)=4 and sizeof(void\*)=8.
che
Yes, almost every 64-bit OS/compiler combination follows the LP64 model: sizeof(int)==4, sizeof(void*)==sizeof(long)==8. Win64/MSVC is the odd one out, following the ILP64 model: sizeof(int)==sizeof(long)==sizeof(void*)==8.
ephemient
Er... What version of MSVC? I'm using MSVC 2005 in x64 mode under Win64 and it's sizeof(int)=4, sizeof(long)=sizeof(void*)=8.
AndreyT
I'm sorry, I don't use Windows and I was going off of data I read elsewhere. Other documentation I found says that Win64/MSVC uses LLP64, where sizeof(int)==sizeof(long)==4, sizeof(void*)==8. Most of the world (Linux, Solaris, BSD, etc.) follows LP64, though.
ephemient
Sorry for the confusion. Minor brain-fart on my part. I'll edit the above to make it (more) correct.
Mark Bessey
Converting between integer and pointer types is peculiar to the compiler you’re using, and in some cases is undefined.
Cirno de Bergerac
@Mark: Do you have access to a 64 bit system where you could demonstrate the truncation? It would be informative to see the exact mechanism by which the truncation occurs, i.e. stack, or different registers, etc..
Robert S. Barnes
Sure, I've got a couple, actually. I'll see if I can post sone generated code later.
Mark Bessey
@Mark: Thanks! I have some questions. The first statement, LM2, explicitly assigns the 64 bit pointer malloc returns to a 32 bit int, storing the return value in the 32 bit eax reg then copying it using movl to x's position on the stack , all of which is pretty much as expected. However the second call to malloc, LM3, puts the return value in the 64 bit rax reg, then uses the movq to copy 64 bits to p's position on the stack. At least in this specific instance your example demonstrates that the implicit declaration of malloc has no effect on the final object code.
Robert S. Barnes
@Mark: The following might make the code and results more clear: printf("Hello, World!\nsizeof(int)=%d\tsizeof(void*)=%d\nx=0x%.16xd\tp=%.16p\n", sizeof(x), sizeof(p), x, p);
Robert S. Barnes
@Mark: Also, it would seem that the assignment to x dropped the least significant 32 bits, not the most significant. Both addresses start with the same 0x1001 sequence...
Robert S. Barnes
+1  A: 

By omitting the declaration (prototype) for malloc, the compiler assumes it returns int. Calls to it therefore get generated as code to call a function that returns an int result.

How this is done varies depending on your system, so the result may get passed back in a data register, an address register, or on the stack.

The compiler then generates additional code to convert the (presumed) returned int value into a pointer.

Obviously, this is not what you want. You might get lucky on most systems, where ints and pointers are the same width, so the conversion of the returned value essentially does nothing, but you can't rely on this behavior.

So all in all, it's a bad thing not to declare external functions.

Loadmaster
Can you demonstrate this? So far I haven't seen any compiler which actually returns a truncated pointer.
Robert S. Barnes
Probably because all your compilers target CPUs having both 32-bit ints and 32-bit addresses, so shoving an int into a pointer is a no-op. You would see a difference if your pointers were wider than your ints, e.g., on 16-bit MS-DOS in Large or Huge memory mode, or perhaps on a DEC Alpha having 64-bit pointers and 32-bit ints, or on systems that return pointers in different registers than ints (MS68K?).
Loadmaster