views:

845

answers:

6

So, I am confused about how jump instructions work in an operating system. I thought that the jump instruction set the value in the processor's program counter. But programs can be run in various locations in memory. I see that in x86, there's the JMP EAX instruction, but my C++ code doesn't seem to use this. I compiled some C++ code in VC++:

int main()
{
    int i = 0;
    while (i < 10)
    {
     ++i;
     if (i == 7)
     {
      i += 1;
      continue;
     }
    }
}

This translates to:

    int main()
    {
00411370  push        ebp  
00411371  mov         ebp,esp 
00411373  sub         esp,0CCh 
00411379  push        ebx  
0041137A  push        esi  
0041137B  push        edi  
0041137C  lea         edi,[ebp-0CCh] 
00411382  mov         ecx,33h 
00411387  mov         eax,0CCCCCCCCh 
0041138C  rep stos    dword ptr es:[edi] 
        int i = 0;
0041138E  mov         dword ptr [i],0 
        while (i < 10)
00411395  cmp         dword ptr [i],0Ah 
00411399  jge         main+47h (4113B7h) 
        {
           ++i;
0041139B  mov         eax,dword ptr [i] 
0041139E  add         eax,1 
004113A1  mov         dword ptr [i],eax 
         if (i == 7)
004113A4  cmp         dword ptr [i],7 
004113A8  jne         main+45h (4113B5h) 
         {
          i += 1;
004113AA  mov         eax,dword ptr [i] 
004113AD  add         eax,1 
004113B0  mov         dword ptr [i],eax 
          continue;
004113B3  jmp         main+25h (411395h) 
         }
        }
004113B5  jmp         main+25h (411395h) 
    }
004113B7  xor         eax,eax 
004113B9  pop         edi  
004113BA  pop         esi  
004113BB  pop         ebx  
004113BC  mov         esp,ebp 
004113BE  pop         ebp  
004113BF  ret

So I'm confused, for the command jmp 411395h, does this imply the program is always loaded in the same spot in memory? Because that seems illogical.

+2  A: 

Most chips have relative jumps (relative to the current location) and virtual addressing.

Kinopiko
Something wrong with the answer? Please leave a comment. Thanks!
Kinopiko
+3  A: 

The memory locations are relative to the process. main is always at the same spot in memory, relative to the beginning of the program.

eduffy
That's not quite true -- some OSes use *address space layout randomization* to load a program at a different address each time it's run to better protect against security threats.
Adam Rosenfield
@Adam, It doesn't matter where it is loaded, the program sees the same addresses space no matter what the os does. Otherwise chaos would insure.
Byron Whitlock
@Byron - exes can be loaded at different addresses. The executable file contains relocation information so that the loader can adjust absolute addresses in the exe if it is not loaded at its prefered address. With exes this isnt that common, it is more common with loading DLLs.
Michael
@Byron - no; that's the whole point of ASLR. Each process gets to see a different layout, so attacks that work on buffer overflows won't work from run to run (if you are lucky). It requires all code to be relocatable, and it requires the dynamic loader to fix up addresses carefully, and once loaded, the address doesn't change, but the programs components can be at different addresses in different runs of the same executable.
Jonathan Leffler
+2  A: 

No. On x86 (and other architectures, too), most jump instructions are IP-relative: the binary machine codes for the instructions represent an offset from the current instruction pointer. So, no matter what virtual address the code gets loaded at, the jump instructions function correctly.

Adam Rosenfield
+4  A: 

No, there are two things possibly at play here - you don't specify an OS so I'm going to give a general answer.

The first is that an executable file is rarely in the final format. As a simplification, compilation turns source into object files and linking combines object files into an executable.

But the executable has to be loaded into memory and, at that stage, there can be even more modifications done. One of these modifications may be to fix up memory references within the executable to point to memory that has been loaded at different locations.

This can be acheived by the executable file containing a list of addresses within itself that need to be fixed up at run time.

There is also a disconnect between virtual memory and physical memory in many modern operating systems.

When your process starts, you get your own (4G for Windows 32bit, I believe) address space into which your process is loaded. The addresses within this address space have little relationship to your actual physical memory addresses and the translation between the two is done by a memory management unit (MMU).

In fact, your process could be flying all over the physical address space as it's paged out and in. The virtual addresses will not change however.

paxdiablo
"you don't specify an OS" How many operating systems does Visual C++ run on?
Kinopiko
Well, he said "I see that in x86" and "I compiled some C++ code in VC++" but I took that just as an example since (1) there are no OS-specific tags; and (2) the question is very general in nature: "in assembly", "an operating system".
paxdiablo
+2  A: 

Relative jumps take the address of the current machine instruction (called instruction pointer) and add an offset to compute the address to be jumped to.

If you look at your code

004113B3  jmp         main+25h (411395h) 
004113B5  jmp         main+25h (411395h) 
004113B7  xor         eax,eax

you'll note that the jmp instruction is 2 bytes long (1 byte for jmp, 1 byte for offset), and cannot possibly store an absolute 4-byte address.

Relative jumps are basic functionality of CPUs (from what I know about 65xx, Z80, 8086, 68000), and are not related to such advanced features as virtual memory, memory mapping or address space randomization.

devio
+5  A: 

As other people wrote, there are relative jump and relative call instructions which essentially add a fixed value to eip and therefore do not depend on the program's location in memory; compilers prefer to use these whenever possible. You can look at the code bytes to see what exact instructions your compiler used. However, I assume you are asking about jumps/calls to absolute addresses.

When the linker generates an executable, it generates absolute addresses supposing a particular base address; Microsoft linker usually uses 400000h. When OS loads an executable or a dll, it "fixes up" all absolute addresses by adding the difference between the address at which the executable was actually loaded and the address at which the linker based it. All executable formats except .com specify some sort of fixup table, which lists all locations in the executable which have to be patched up in this way. Therefore, after the OS loads your executable into memory at base address, say, 1500000h, your jump will look like jmp 1511395h. You can check this by looking at actual code bytes with a debugger.

Older Windows systems preferred to load executables at the base address used by the linker; this created a security risk, because an attacker would know in advance what is where in memory. This is why newer systems use base address randomization.

Anton Tykhyy
The jmp instructions at 004113B3 and ...B5 must be relative jumps. We can tell from the address labels that these jmp instructions are encoded as two-byte instructions. Therefore, they are relative jmp instructions.Two-byte jmp instructions cause EIP to be reloaded with EIP +/-127. The value of +/-127 is the second byte of the two-byte opcode. The first byte of that jmp opcode is EB. There are versions of jmp (e.g.,non-relative addressing) which begin with E9, EA, FF -- so it is important to realize that there are a few different opcodes all with the mnemonic "jmp" in assembly language.
Heath Hunnicutt