views:

101

answers:

3

Hey guys,

I thought about writing a language for the sake of writing a language, and now that I'm done with the parser and the AST, I have to do something about the library. Specifically, basic types.

I'm going to use a very basic intermediate representation before I pass that down to LLVM and get native code that way. Though, since my internal representation is extremely basic, it does not support any way to define an int in itself; so the type somehow has to "break through the Matrix" and exist on a lower-level than the other non-primitive types.

This is not the object of my question. Feel free to comment, but that's not what it's about.

The real thing is that, in the process of trying to see how others did it, a friend of mine reflected the System.Int32 class from the .NET Framework (on my end I tried with Mono, and it does the same thing). And it found that it contains a single field:

System.Int32 m_value;

And I don't even see how that's possible.

This int really is the "backing integer" of the one you have: if you box an int and use reflection to change the value of its m_value field, you effectively change the value of the integer:

object testInt = 4;
Console.WriteLine(testInt); // yields 4

typeof(System.Int32)
    .GetField("m_value", BindingFlags.NonPublic | BindingFlags.Instance)
    .SetValue(testInt, 5);
Console.WriteLine(testInt); // yields 5

There's gotta be a rational explanation behind this singularity. How can a value type contain itself? It simply makes no sense. And it's not just there to look good: changing the internal value for another System.Int32 works. What magic does the CLR use to make it work?

My world just collapsed.

+1  A: 

The magic is actually in the boxing/unboxing.

System.Int32 (and its alias int) is a value type, which means that it's normally allocated on the stack. The CLR takes your System.Int32 declaration and simply turns it into 32 bits of stack space.

However, when you write object testInt = 4;, the compiler automatically boxes your value 4 into a reference, since object is a reference type. What you have is a reference that points to a System.Int32, which is now 32 bits of space on the heap somewhere. But the auto-boxed reference to a System.Int32 is called (...wait for it...) System.Int32.

What your code sample is doing is creating a reference System.Int32 and changing the value System.Int32 that it points to. This explains the bizarre behavior.

JSBangs
The point is that `System.Int32` is a value type but yet contains another `System.Int32`, not that I can magically change its value through reflection. That, or I didn't understand your answer.
zneak
@zneak, I think what's happening is that the `System.Int32` that you reflect on is the *reference* version. The *value* gets treated by the compiler as magic, which avoids the infinite loop. This is supported by the link in Jason's answer.
JSBangs
+1  A: 

Check out this thread for a laborious discussion of this mystery.

Jason
+2  A: 

As noted, a 32-bit integer can exist in two varieties. Four bytes anywhere in memory or a CPU register (not just the stack), the fast version. And it can be embedded in System.Object, the boxed version. The declaration for System.Int32 is compatible with the latter. When boxed, it has the typical object header, followed by 4 bytes that stores the value. And those 4 bytes map exactly to the m_value member. Maybe you see why there's no conflict here: m_value is always the fast, non-boxed version. Because there is no such thing as a boxed boxed integer.

Both the language compiler and the JIT compiler are keenly aware of the properties of an Int32. The compiler is responsible for deciding when the integer needs to be boxed and unboxed, it generates the corresponding IL instructions to do so. And it knows what IL instructions are available that allows the integer to be operated on without boxing it first. Readily evident from the methods implemented by System.Int32, it doesn't have an override for operator==() for example. That's done by the CEQ opcode. But it does have an override for Equals(), required to override the Object.Equals() method when the integer is boxed. Your compiler needs to have that same kind of awareness.

Hans Passant