views:

100

answers:

3

I understand that this is an implementation detail. I'm actually curious what that implementation detail is in Microsoft's CLR.

Now, bear with me as I did not study CS in college, so I might have missed out on some fundamental principles.

But my understanding of the "stack" and the "heap" as implemented in the CLR as it stands today is, I think, solid. I'm not going to make some inaccurate umbrella statement such as "value types are stored on the stack," for example. But, in most common scenarios -- plain vanilla local variables, of value type, either passed as parameters or declared within the method and not contained inside a closure -- value type variables are stored on the stack (again, in Microsoft's CLR).

I guess what I'm unsure of is where ref value type parameters come in.

Originally what I was thinking was that, if the call stack looks like this (left = bottom):

A() -> B() -> C()

...then a local variable declared within the scope of A and passed as a ref parameter to B could still be stored on the stack--couldn't it? B would simply need the memory location where that local variable was stored within A's frame (forgive me if that isn't the right terminology; I think it's clear what I mean, anyway).

I realized this couldn't be strictly true, though, when it occurred to me that I could do this:

delegate void RefAction<T>(ref T arg);

void A()
{
    int x = 100;

    RefAction<int> b = B;

    // This is a non-blocking call; A will return immediately
    // after this.
    b.BeginInvoke(ref x, C, null);
}

void B(ref int arg)
{
    // Putting a sleep here to ensure that A has exited by the time
    // the next line gets executed.
    Thread.Sleep(1000);

    // Where is arg stored right now? The "x" variable
    // from the "A" method should be out of scope... but its value
    // must somehow be known here for this code to make any sense.
    arg += 1;
}

void C(IAsyncResult result)
{
    var asyncResult = (AsyncResult)result;
    var action = (RefAction<int>)asyncResult.AsyncDelegate;

    int output = 0;

    // This variable originally came from A... but then
    // A returned, it got updated by B, and now it's still here.
    action.EndInvoke(ref output, result);

    // ...and this prints "101" as expected (?).
    Console.WriteLine(output);
}

So in the example above, where is x (in A's scope) stored? And how does this work? Is it boxed? If not, is it subject to garbage collection now, despite being a value type? Or can the memory immediately be reclaimed?

I apologize for the long-winded question. But even if the answer is quite simple, maybe this will be informative to others who find themselves wondering the same thing in the future.

+2  A: 

Look at the code generated with reflector to find out. My guess is that an anonymous class containing x is generated, like when you use closures (lambda expressions that reference variables in the current stack frame). Forget about this and read the other answers.

jdv
This does not appear to be the case. See my answer for more details.
LBushkin
@LBushkin: Aaaaaaaaaah. So my guess is wrong.
jdv
+3  A: 

The CLR is completely out of the loop on this, it is the job of the JIT compiler to generate the appropriate machine code to get an argument passed by reference. Which is an implementation detail in itself, there are different jitters for different machine architectures.

But the common ones do it exactly the way a C programmer does it, they pass a pointer to the variable. That pointer is passed in a CPU register or on the stack frame, depending on how many arguments the method takes.

Where the variable lives doesn't matter, a pointer to a variable in the stack frame of the caller is just as valid as a pointer to member of a reference type object that's stored on the heap. The garbage collector knows the difference between them, by virtue of the pointer value, adjusting the pointer if necessary when it moves an object.

Your code snippet invokes magic inside the .NET framework that's required to make marshaling calls from one thread to another work. This is the same kind of plumbing that makes Remoting works. To make such a call, a new stack frame has to be created on the thread where the call is performed. The remoting code uses the type definition of the delegate to know what that stack frame should look like. And it can deal with arguments passed by reference, it knows that it needs to allocate a slot in the stack frame to store the pointed-to variable, i in your case. The BeginInvoke call initializes the copy of the i variable in the remoted stack frame.

The same thing happens on the EndInvoke() call, the results are copied back from the stack frame in the threadpool thread. Key point is that there isn't actually a pointer to the i variable, there's a pointer to the copy of it.

Not so sure this answer is very clear, having some understanding of how CPUs work and a bit of C knowledge so the concept of a pointer is crystal can help a lot.

Hans Passant
I think that the OPs example is constructed so that the stack frame of `A()` is no longer available. Hence the question of how the variable is passed by ref to the asynchronous method.
LBushkin
Thanks @LBushkin, I missed that. Post updated.
Hans Passant
Isn't the JITter considered a part of the Common Language Runtime (which itself is an implementation of the CLI Virtual Execution System, see Ecma-335 §12)?
Novox
See also http://stackoverflow.com/questions/601974/clr-vs-jit
Novox
@Novox, well, debatable. But it is a *very* distinct chunk of code in the codebase. And separate DLLs. And separate teams at MSFT that work on it.
Hans Passant
Very interesting! So it seems that a `ref` parameter in the context of a `BeginInvoke`/`EndInvoke` async call is more like a "hybrid" in the sense that the value is passed as a *copy* to `BeginInvoke`, which in turn passes that *copy* by *reference* to `EndInvoke`. Am I understanding that correctly?
Dan Tao
Not so sure about 'hybrid', this is values getting copied from one stackframe to another. Logically it is similar to lambda captures, but with very different implementation details.
Hans Passant
@Hans: Yeah, I don't know about "hybrid" either. But I guess it's just surprising to me that, as we can see in LBushkin's example, the `ref` parameter of `BeginInvoke` does not point to the location of the original variable -- as you say, it is a copy that is passed (as a `ref` parameter). So passing a variable as a `ref` parameter has different behavior in a synchronous versus an asynchronous context.
Dan Tao
Well, work from what you found. How on Earth can it pass a pointer to a local variable when the variable is gone? It doesn't, it passes a pointer to another one. A copy. Which implies that the pointer has a different value. It *has* to work that way because different threads have different stacks.
Hans Passant
@Hans: right, I completely understand. But before I learned about this behavior (you know, that the original doesn't get changed), I guess I thought, maybe the CLR actually puts the value in some long-term storage location since it's going to need to be accessed after the current method goes out of scope. This might sound ridiculous, as for a local variable it really makes no difference (since the original will be out of scope by the time `EndInvoke` is called anyway); but even calling `BeginInvoke` with a `ref` param pointing to a *member field* seems to do the pointer-to-copy thing too.
Dan Tao
+4  A: 
LBushkin
Wow, great test. I didn't even think to try that (actually, I guess I *assumed* that if I called `EndInvoke` from within the frame of **A**, it would invalidate my findings since the whole issue I was uncertain about was how the `ref` parameter is stored once **A** 's frame is no longer available)! It's funny, though; this seems to clear up one point of confusion (the `ref` parameter clearly does not point to the location of the original variable) in exchange for another (so a `ref` parameter passed to `BeginInvoke` isn't really a `ref` parameter at all?).
Dan Tao
@Dan Tao: As I mention above, the IL from reflector indicates that `ref` arguments to `BeginInvoke()` are indeed passed by ref. I suspect, however, that internally `BeginInvoke()` makes a copy of the value into the `IAsyncResult` object and pass the copy `by ref` to `B()`. Ultimately, only `A()` can observe the inconsistency here, if it chooses to pass a variable other than `x` when calling `EndInvoke()`.
LBushkin
@LBushkin: Yes, as Hans mentions in his updated answer (if I understand him correctly), the `BeginInvoke` call is given a `ref` parameter that points to the location of a *copy* of the original variable. I also tested this with a non-local variable, in fact -- an instance field -- and saw the same behavior (so it isn't just behavior that only matters in this contrived example): passing the field as a `ref` parameter to a `BeginInvoke` call actually did not change the value of the field.
Dan Tao
(LBushkin asked me for an opinion on this answer.) I am not an expert on this area but I believe your analysis is sound. My understanding is that the cross-thread marshaller does copy-in-copy-out semantics in this case, since maintaining a reference to the original variable is clearly unsafe, as the original poster notes.
Eric Lippert
@Eric Lippert: Thanks Eric. One aspect that remains unclear to me is why Begin/EndInvoke are native, rather than managed methods. This may simply be because the CLR provides their implementation, perhaps.
LBushkin