tags:

views:

401

answers:

5

In C#, any user-defined struct is automatically a subclass of System.Struct and System.Struct is a subclass of System.Object.

But when we assign some struct to object-type reference it gets boxed.

e.g.

struct A{
    public int i;
}

A a;
object obj=a; // boxing takes place here

So my question is - if A is an descendant of System.Object, can't compiler upcast it to object type instead of boxing?

+20  A: 

A struct is a value type. System.Object is a reference type. Value types and reference types are stored and treated differently by the runtime. For a value type to be treated as a reference type, it's necessary for it to be boxed. From a low level perspective, this includes copying the value from the stack where it originally lives to the newly allocated memory on the heap, which also contains an object header. Additional headers are necessary for reference types to resolve their vtables to enable virtual method dispatches and other reference type related features (remember that a struct on stack is just a value and it has zero type information; it doesn't contain anything like vtables and can't be directly used to resolve dynamically dispatched methods). Besides, to treat something as a reference type, you have to have a reference (pointer) to it, not the raw value of it.

So my question is - if A is an descendant of System.Object, can't compiler upcast it to object type instead of boxing?

At a lower level, a value does not inherit anything. Actually, as I said before, it's not really an object. The fact that A derives from System.ValueType which in turn derives from System.Object is something defined at the abstraction level of your programming language (C#) and C# is indeed hiding the boxing operation from you pretty well. You don't mention anything explicitly to box the value so you can simply think the compiler has "upcasted" the structure for you. It's making the illusion of inheritance and polymorphism for values while none of the tools required for polymorphic behavior is directly provided by them.

Mehrdad Afshari
+1 to you because you explain it so much better than me..
Stan R.
Thanks Mehrdad! That cleared my doubt! :)
Red Hyena
Good answer. Couple minor problems with it though. First, the stack vs heap is irrelevant to boxing; value types need not be on the stack, and they are boxed even if they are on the heap. Second, virtual methods are irrelevant; boxing is never needed to dispatch a virtual method on a struct! Since all structs are sealed, the jitter has sufficient information to exactly determine which method is called at jit time.
Eric Lippert
Eric: I *knew* you're going to comment on that. I mentioned the stack and heap metaphor mainly to point out you'll need to have some kind of pointer to it. Regarding your second point, I think you're referring to the `constrained` IL instruction. What I meant though is calling something like `ToString` on a struct casted to `System.Object` or say, `IComparable.CompareTo` on a boxed integer statically typed as `IComparable`. I think vtable lookup is required here, isn't it?
Mehrdad Afshari
An invocation of a method on a boxed value type is treated as a "vtable" call, yes; the jitter has no reason to believe that it's anything special. (The question of whether in practice calls to interface methods are what a C++ compiler writer would strictly speaking think of as a "vtable" call is an interesting one but not that germane to this question.) But many people incorrectly believe that calling interface methods on an unboxed struct actually boxes the struct and then does the virtual call; why would the jitter go to all that trouble when the method is already named in the metadata?
Eric Lippert
True. I've had this discussion with people who think `2.ToString()` will box `2`. By the way, is it possible to demonstrate this fact with C# code only? I mean, short of disassembly or digging through `WinDbg`... `System.Object` does not provide a method that mutates a boxed value and I don't know a way to prove this.
Mehrdad Afshari
(And yes, the .constrained prefix instruction helps hint to the jitter that a particular invocation can skip the boxing. If you're interested in how interface dispatch does not actually work the same as virtual method calls, here's an old article that explains it: http://msdn.microsoft.com/en-us/magazine/cc163791.aspx#S12)
Eric Lippert
Hmm, interesting question. Nothing immediately comes to mind.
Eric Lippert
+1  A: 

struct is a value-type by design, hence it needs to be boxed when turned into a reference type. struct derives from System.ValueType, which in term derives from System.Object.

The mere fact that struct is a descendant of object, does not mean much..since the CLR deals with structs differently at runtime than a reference type.

Stan R.
+3  A: 

While the designers of .NET certainly didn't need to include boxing section 4.3 of the C# Language Specification explains the intent behind it quite well, IMO:

Boxing and unboxing enables a unified view of the type system wherein a value of any type can ultimately be treated as an object.

Because value types are not reference types (which System.Object ultimately is), the act of boxing exists in order to have a unified type system where the value of anything can be represented as an object.

This is different from say, C++ where the type system isn't unified, there isn't a common base type for all types.

casperOne
Strictly speaking, not everything derives from object. On the type side, pointer types are neither convertible to nor derived from object. Type parameter types and interface types do not derive from object but are always convertible to object. It is *values* of non-pointer types which always derive from object. Except for the null value of reference types, which derives from nothing, not being an object. A reference that refers to nothing does not refer to an object; such a reference is convertible to object but does not derive from object.
Eric Lippert
@Eric Lippert: Changed the answer to reflect your concerns.
casperOne
+11  A: 

Here's how I prefer to think about it. Consider the implementation of a variable containing a 32 bit integer. When treated as a value type, the entire value fits into 32 bits of storage. That's what a value type is: the storage contains just the bits that make up the value, nothing more, nothing less.

Now consider the implementation of a variable containing an object reference. The variable contains a "reference", which could be implemented in any number of ways. It could be a handle into a garbage collector structure, or it could be an address on the managed heap, or whatever. But it's something which allows you to find an object. That's what a reference type is: the storage associated with a variable of reference type contains some bits that allow you to reference an object.

Clearly those two things are completely different.

Now suppose you have a variable of type object, and you wish to copy the contents of a variable of type int into it. How do you do it? The 32 bits that make up an integer aren't one of these "reference" things, it's just a bucket that contains 32 bits. References could be 64 bit pointers into the managed heap, or 32 bit handles into a garbage collector data structure, or any other implementation you can think of, but a 32 bit integer can only be a 32 bit integer.

So what you do in that scenario is you box the integer: you make a new object that contains storage for an integer, and then you store a reference to the new object.

Boxing is only necessary if you want to (1) have a unified type system, and (2) ensure that a 32 bit integer consumes 32 bits of memory. If you're willing to reject either of those then you don't need boxing; we are not willing to reject those, and so boxing is what we're forced to live with.

Eric Lippert
Eric, as always, very nice explanation! Though could you please expand a little bit on what you mean when you say that 'boxing is necessary if you want to have a unified type system.' I am failing to grasp how 'boxing' unifies the type system. Thanks!
SolutionYogi
Thinking about it a bit more, are you suggesting that C# prefers a system where the developer can 'treat' value types similar to a reference type without needing to understand how they are actually implemented by .NET CLR? And to achieve this, 'boxing' becomes necessary evil? What would it look like if you chose to avoid 'boxing', how would a developer interact with value types/reference types?
SolutionYogi
Let me rephrase. Three desirable things: (1) value types only contain their data and therefore have a different representation than reference types, (2) all values can be converted to a common unified type, object, and (3) value types never need to be "boxed". Those three desirable things are mutually impossible; you can have at most two of them. We've chosen to have (1) and (2); not having (3) is the price you pay.
Eric Lippert
Similarly, you probably want your camera to be (1) cheap, (2) lightweight, and (3) take good pictures. You only get two out of the three; which two you choose is up to you, but you don't get all three.
Eric Lippert
A: 

After the question has been answered I'll present a little "trick" related to that topic:

structs can implement interfaces. If you pass a value type to a function that expects an interface that this value type implements the value normally gets boxed. Using generics you can avoid the boxing:

interface IFoo {...}
struct Bar : IFoo {...}

void boxing(IFoo x) { ... }
void byValue<T>(T x) : where T : IFoo { ... }

var bar = new Bar();
boxing(bar);
byValue(bar);
helium