views:

342

answers:

7

This is similar to http://stackoverflow.com/questions/1204078/what-happens-when-you-run-a-program, but not a dupe.

Let's say I have a simple console program with two methods Aand B.

    public static void RunSnippet()
    {
     TestClass t = new TestClass();
     t.A(1, 2);

     t.B(3, 4);
    }

    public class TestClass
    {
     public void A(int param1, int param2)
     {
      //do something
      C();
     }

     private void C()
     {
      //do
     }

     public bool B(int param1, int param2)
     {
      //do something
      bool result = true;

      return result;
     }
    }

Can someone explain in detail (but please keep it in simple plain English), what really happens when RunSnippet calls method A and method B (and they internally call some other methods). I want to understand what really happens under the hood...meaning how are params passed, where are they stored, what happens to local vars, how are return values passed, What will happen if another thread starts running when A has called C, what will happen if an exception is thrown.

+8  A: 

I'm not quite sure what level of detail you're looking for, but here's my stab at explaining what's happening:

  1. A new process is created for your executable. That process has a stack segment containing each thread's stack, a data segment for static variables as well as a memory block called the heap for dynamically allocated memory, and a code segment containing the compiled code.
  2. Your code is loaded into the code segment, the instruction pointer is set to the first instruction in your main() method, and the code begins executing.
  3. Object t is allocated from the heap. The address of t is stored on the stack (each thread has it's own stack).
  4. t.A() is called by placing the return address to main() on the stack and changing the instruction pointer to the start of t.A()'s code. The return address is placed on the stack along with the values 1 and 2.
  5. t.A() calls t.C() by placing the return address to t.A() on the stack and changing the instruction pointer to the start address of t.C()'s code.
  6. t.C() returns by popping the return address to t.A() off of the stack and setting the instruction pointer to that value.
  7. t.A() returns in a similar manner to t.C().
  8. The call to t.B() is very similar to the call to t.A() except that it returns a value. The exact mechanism to return that value is language and platform dependent. Often the value will be returned in a CPU register.

Note: Since your methods are very small, modern compilers will often "inline" them instead of making a classic call. Inlining means taking the code from the methods and injecting them straight into the main() method rather than going through the (slight) overhead of making a function call.

Given your example I don't see how threading could come into the picture directly. If you were to start the executable a second time, it would run in a new process. That means it would get it's own code segment, data segment and stack segment completely isolating it from the first process.

If your code were run inside a larger program that called main() on several threads, it would run almost exactly as previously described. The code is thread safe because it doesn't access any potentially shared resources such as static variables. There is no way that Thread 1 could "see" Thread 2 because all key data (values and pointers to objects) is stored on the thread's local stack.

Eric J.
+1 this looks good. Thanks.
Sandbox
As far as threading is concerned, the question wasn't exactly related to my example. But, say if my example was running on a worker thread and another thread is spawned by the main thread then in that case what happens to the method that is currently being executed?
Sandbox
If main spawns another thread, it will continue executing to the final "}" and return. At that point the result is no longer language and platform agnostic (i.e. it depends a bit on your platform and language), but typically when the main method exits, the process is terminated and any threads that are still running are abruptly terminated. If that's not what you want, you need to wait for threads you started to terminate. Most languages offer a way to do that gracefully. In C#, you can use thread.Join().
Eric J.
+1  A: 

You mean at the assembly language level, or at the OS level?

In terms of assembly, what happens when you call a method is that all arguments are pushed on the stack, and finally the address of the method (if it's virtual, there's an extra table lookup). The code then continues from the address of the method, until a "ret" instruction is hit and execution resumes from where the call was made. You should study assembly and how C is compiled to get a good grip on that process.

At the OS level, there's nothing special involved in invoking the method, all the OS does is allocate CPU time to the process and that process is responsible for doing what it wants during that time, be it calling methods or whatever. Switching between threads, however, is done by the OS (unlike you are using software threads like in CPython).

Dr_Asik
A: 

TestClass t = new A();

I think you mean new TestClass() here.

As for what happens under the hood, the compiler will convert this code to Java bytecode. Here's an excerpt from an article on "How the Java virtual machine handles method invocation and return".

When the Java virtual machine invokes a class method, it selects the method to invoke based on the type of the object reference, which is always known at compile-time. On the other hand, when the virtual machine invokes an instance method, it selects the method to invoke based on the actual class of the object, which may only be known at run time.

The JVM uses two different instructions, shown in the following table, to invoke these two different kinds of methods: invokevirtual for instance methods, and invokestatic for class methods.

Method invocation of invokevirtual and invokestatic Opcode Operand(s) Description

invokevirtual indexbyte1, indexbyte2 pop objectref and args, invoke method at constant pool index

invokestatic indexbyte1, indexbyte2 pop args, invoke static method at constant pool index

Pierre-Antoine LaFayette
yes..corrected it. Thanks.
Sandbox
+4  A: 

A function call is essentially a goto statement, except that at the end, it must return to where it got called from.

There's a function call stack that essentially holds information about where to "return".

A function call requires:

  • Store (push) the location of the current instruction on the stack for the called function to use when it's done.
  • Push all parameters to the stack as well
  • goto the first instruction in the called function.

When the called function needs to read the parameters, it will read them from the stack.

When the called function is done, or hits a "return" statement, it finds the address it needs to return to, and "goto"s to it.

hasen j
A: 

What t.B(3, 4) does:


push 4
push 3
call B
add esp, 8 // release memory used

call pushes the address of the instruction right after the call into the stack, then jump the process thread to B() address:


push ebp // save EBP state, the caller will need it later
mov ebp, esp // save ESP state
// push registers I would use but EAX, I'm not using any
sub esp, 4 // alloc 4 bytes in the stack to store "result"
mov dword ptr [ebp-4], 1 // result = 1 (true)
mov eax, dword ptr [ebp-4] // prepares return value o be "result"
add esp, 4 // frees allocked space
// pop registers
mov esp, ebp
pop ebp
ret

Object implementation is shared. When you declare a new object, all stored are the object variables. In this case, there's none referenced.

About multiple threads, threads memory are separated. Nothing really happens in the thread flow when kernel switches processor to another thread. Kernel simply freezes and resumes this flow.

Havenard
ebp? esp? eax? am lost
Sandbox
+1  A: 

If you are interested in the Assembly level explanation of what happens, I recommend watching this lecture from CS107 @ Stanford university. I found it did a very good job of explaining exactly what are the costs of function calls in a very, very plain english manner.

http://www.youtube.com/watch?v=FvpxXmEG1F8&feature=PlayList&p=9D558D49CA734A02&index=9

iano
A: 

(assuming x86) First you have to understand the stack. Functions use an area of memory called the "stack". You can think of it like a stack of plates, where each plate contains a DWORD (32 bits) of data. There is a register in the CPU that keeps track of the current location in the stack (it's just a virtual memory address) that we are dealing with. It's called the stack pointer and is typically stored in a the esp register.

When functions interact with the stack, they are typically doing one of two things: a push or a pop. A "push" is when something it put on top of the stack, which consists of moving the stack pointer to the next highest position and then copying something to that new location (the new top). A push "grows the stack" because there is more data being stored there now (more plates).

A "pop" is when the top most item on the stack is "removed", which consists of copying whatever is currently on top of the stack (being pointed to by the esp register) to a cpu register (typically eax) and then moving the stack pointer to one position lower in the stack.

So now we can talk about setting up to call a function.

code

t.B(3, 4);

assembly

// here is a push we described above. The function we are in currently is
// pushing the value "4" onto the stack. This is one of the arguments to the
// B function we are calling. Note that we push the last argument first
push 4 
// here is another push. This time we are pushing the next argument to the 
// B function
push 3
call B  // this call sets up the context for the next function to run

When a call occurs, we are transitioning the context from the current function to the function being called. The extra peices information the function needs to run are the arguments, which we pushed onto the stack.

The new function now will do some house keeping with the stack to make room for the local variables it has as well as saving the stack pointer into a register so that it can be reverted once the function returns. If this didn't happen, then the calling function would be all disoriented when it regains control with no idea how to access the stuff it has previously put onto the stack, such as it's own local variables or the context for the stack pointer for the function that called it.

Now here it is happening in assembly (stealing this from Havenard).

// Here is the B function making sure that the calling function can get back to
// the it's stack context when B returns. 
push ebp
mov ebp, esp
// remember when I said that a push was growing the stack. Well you can also grow
// it just by moving the stack pointer higher, as if there were already more plates there
// you may wonder why we are subtracting (sub) from the stack pointer (esp) to grow it
// the reason is that the stack "grows down" in memory. In other words, as the stack grows
// the memory addresses of the stack grow smaller.
// the reason we are subtracting 4 is because we only need to grow the stack by one plate
// so that we can store the local variable 'result' there. If we had 2 local variables
// we would have subtracted 8
sub esp, 4 
// the instructions below are simply moving the static value 1 into the local variable
// 'result'. Local variables are always referenced relative to the bottom of the stack
// context for the current function. This value is stored in the ebp register, which we
// saw earlier in the function setup above.
// so now we think of the location where the 'result' variable is stored as "ebp-4"
// we know that because we put it there.
mov dword ptr [ebp-4], 1 // result = 1 (true)
// eax is a special register that contains the return value of the function. That is why
// you see the value of 'result' (which we know as [ebp-4] in the eax register
mov eax, dword ptr [ebp-4]
// We adjust the stack pointer back to it's previous location 
// before we subtracted to make room for our local variable
add esp, 4
// Our work is done now.. time to clean stuff up for our calling function and 
// leave things as we found them. Our trusty ebp register stores the old stack pointer
// that our calling function needs to resume it's stack context.
mov esp, ebp
pop ebp
ret

I'm sure there are some details I've left out, especially on returning from the B function, but this is a pretty good overview I think.

Christopher Scott