views:

140

answers:

3

Hello, I have written a simple Hello World program.

   #include <stdio.h>
    int main() {
    printf("Hello World");
    return 0;
    }

I wanted to understand how the relocatable object file and executable file look like. The object file corresponding to the main function is

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   bf 00 00 00 00          mov    $0x0,%edi
   9:   b8 00 00 00 00          mov    $0x0,%eax
   e:   e8 00 00 00 00          callq  13 <main+0x13>
  13:   b8 00 00 00 00          mov    $0x0,%eax
  18:   c9                      leaveq 
  19:   c3                      retq 

Here the function call for printf is callq 13. One thing i don't understand is why is it 13. That means call the function at adresss 13, right??. 13 has the next instruction, right?? Please explain me what does this mean??

The executable code corresponding to main is

00000000004004cc <main>:
  4004cc:       55                      push   %rbp
  4004cd:       48 89 e5                mov    %rsp,%rbp
  4004d0:       bf dc 05 40 00          mov    $0x4005dc,%edi
  4004d5:       b8 00 00 00 00          mov    $0x0,%eax
  4004da:       e8 e1 fe ff ff          callq  4003c0 <printf@plt>
  4004df:       b8 00 00 00 00          mov    $0x0,%eax
  4004e4:       c9                      leaveq 
  4004e5:       c3                      retq 

Here it is callq 4003c0. But the binary instruction is e8 e1 fe ff ff. There is nothing that corresponds to 4003c0. What is that i am getting wrong?

Thanks. Bala

+4  A: 

In the first case, take a look at the instruction encoding - it's all zeroes where the function address would go. That's because the object hasn't been linked yet, so the addresses for external symbols haven't been hooked up yet. When you do the final link into the executable format, the system sticks another placeholder in there, and then the dynamic linker will finally add the correct address for printf() at runtime. Here's a quick example for a "Hello, world" program I wrote.

First, the disassembly of the object file:

00000000 <_main>:
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  -0x4(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 04                sub    $0x4,%esp
  11:   e8 00 00 00 00          call   16 <_main+0x16>
  16:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  1d:   e8 00 00 00 00          call   22 <_main+0x22>
  22:   b8 00 00 00 00          mov    $0x0,%eax
  27:   83 c4 04                add    $0x4,%esp
  2a:   59                      pop    %ecx
  2b:   5d                      pop    %ebp
  2c:   8d 61 fc                lea    -0x4(%ecx),%esp
  2f:   c3                      ret    

Then the relocations:

main.o:     file format pe-i386

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE 
00000012 DISP32            ___main
00000019 dir32             .rdata
0000001e DISP32            _puts

As you can see there's a relocation there for _puts, which is what the call to printf turned into. That relocation will get noticed at link time and fixed up. In the case of dynamic library linking, the relocations and fixups might not get fully resolved until the program is running, but you'll get the idea from this example, I hope.

Carl Norum
Any comment from the downvoter?
Carl Norum
+4  A: 

Calls are relative in x86, IIRC if you have e8 , the call location is addr+5.

e1 fe ff ff a is little endian encoded relative jump. It really means fffffee1.

Now add this to the address of the call instruction + 5: (0xfffffee1 + 0x4004da + 5) % 2**32 = 0x4003c0

Longpoke
The +5 is because it's relative to the *next* instruction after the call, and the call is 5 bytes long.
caf
Calls on x86 can be either relative or absolute. It is just that `E8` is a relative call.
AndreyT
Yeah I forgot there are also absolute destinations, but they are either specified by segment:selector, or a pointer to an address to jump to.
Longpoke
+4  A: 

The target of the call in the E8 instruction (call) is specified as relative offset from the current instruction pointer (IP) value.

In your first code sample the offset is obviously 0x00000000. It basically says

call +0

The actual address of printf is not known yet, so the compiler just put the 32-bit value 0x00000000 there as a placeholder.

Such incomplete call with zero offset will naturally be interpreted as the call to the current IP value. On your platform, the IP is pre-incremented, meaning that when some instruction is executed, the IP contains the address of the next instruction. I.e. when instruction at the address 0xE is executed the IP contains value 0x13. And the call +0 is naturally interpreted as the call to instruction 0x13. This is why you see that 0x13 in the disassembly of the incomplete code.

Once the code is complete, the placeholder 0x00000000 offset is replaced with the actual offset of printf function in the code. The offset can be positive (forward) or negative (backward). In your case the IP at the moment of the call is 0x4004DF, while the address of printf function is 0x4003C0. For this reason, the machine instruction will contain a 32-bit offset value equal to 0x4003C0 - 0x4004DF, which is negative value -287. So what you see in the code is actually

call -287

-287 is 0xFFFFFEE1 in binary. This is exactly what you see in your machine code. It is just that the tool you are using displayed it backwards.

AndreyT