views:

141

answers:

3

Hi!

I'm wondering whether boxing a value type in an object is a special case or whether the "box" constructed by .NET becomes garbage (that the GC has to collect) after any references to it are dropped.

For example, StringBuilder.AppendFormat() has these overloads:

StringBuilder.AppendFormat(string format, object arg0);
StringBuilder.AppendFormat(string format, object arg0, object arg1);
StringBuilder.AppendFormat(string format, object arg0, object arg1, object arg2);
StringBuilder.AppendFormat(string format, params object[] args);

Having those additional overloads for calls with 3 or fewer arguments might indicate that boxing indeed is a special case (or that it pays off, from a performance point-of-view, to avoid array construction).

Theoretically, using plain old reference counting, possibly with a pool of reusable boxes would be a valid implementation because there can be no references from one box to another, only from .NET objects to a box.

+3  A: 

A value type that is boxed becomes an object on the heap, and like any other object must (and will) be garbage collected once it is no longer referenced.

Creating method overloads with 3 or fewer arguments is (as you observe) to avoid array construction, and is a performance optimization. See "Consider providing special overloads and code paths for calls with a small number of arguments in extremely performance-sensitive APIs" at Members with a Variable Number of Parameters.

However, creating an array is fundamentally different than boxing a value type. Calling any overload of StringBuilder.AppendFormat will always box arguments that are value types, because the parameter is typed as object, whether or not an array is created. For a detailed explanation of boxing, see "Boxing and Unboxing" at .NET: Type Fundamentals.

Bradley Grainger
+6  A: 

First off, just to clarify: creating an array of object references is not boxing. "Boxing" is a term with a very specific meaning in .NET, and I think it's worth sticking to it.

Boxing does create garbage - or rather, each time you box, it creates a new object which is likely to eventually become garbage. (It doesn't have to become garbage - you might have a reference to that object for the rest of the app's lifetime; it's just pretty rare.)

However, you could have a cache for boxing purposes. Indeed, Java does for small numbers. If you write:

Integer x = 5;
Integer y = 5;
System.out.println(x == y); // Reference comparison

then that's guaranteed to print true.

However, that's just a small cache for a fixed set of types - it's not a general purpose cache. You need to balance the pain of having a general cache with weak references (not reference counting - the GC mechanism in .NET just isn't reference counted, and you couldn't really introduce that just for boxed values) would almost certainly hurt performance more than the small cost of boxing creating garbage.

.NET could have taken the same approach as Java and boxed some values of some types, but I'm not sure it's worth the extra conceptual baggage - especially when the platform supports custom value types (which Java doesn't).

It's probably worth noting that since .NET 2.0, boxing is somewhat rarer than it used to be. It happens a fair amount in data binding and reflection, but it's less common in plain old data manipulation now.

Jon Skeet
I am sticking to the specific meaning of "boxing" in .NET. I was just wondering if those overloads in the StringBuilder class were there just to prevent *additional* garbage in the form of the array, or whether they would prevent *all* garbage because, eg. boxing an integer in an object would not produce any garbage.
Cygon
@Cygon: I think bringing the automatic array creation with a "params" parameter somewhat obscured your question then - the overloads are completely irrelevant to boxing. Yes, the overloads could potentially avoid creating extra garbage, although I don't believe they do - I believe the overloads for individual parameters actually call one with an array, creating the array explicitly. It does help languages which don't support parameter arrays though.
Jon Skeet
I do not think that your Java code is *guaranteed* to print `true`. It *will* print `true` with Sun's JVM, because the implementation of `Integer.valueOf(int)` (the method internally used for boxing) caches `Integer` instances for values from -128 to 127, but that is not a *requirement* of the Java platform. Other JVM could claim conformance without such caching.
Thomas Pornin
@Thomas: No, it *is* guaranteed. See section 5.1.7 of the Java Language Specification, third edition.
Jon Skeet
A: 

You are asking the wrong question.

The overloads that you are pointing to is to optimize for direct parameter calling. Meaning the compiler will put the variables into arg_0, arg_1, arg_2, arg_3, it is possible to have more than that, but the IL only has these as quick access. The rest goes through the stack anyway, and is therefore not much more effecient than the param typed function call.

For the param typed function call, it actually makes an array behind the scenenes and send that to the function as arg_1 (in this case, where arg_0 is taken up by the string).

Cine
Sorry, no.I know all that (yes, down to the level of calling conventions) and I believe my question shows that I'm *not* asking about the sense of the StringBuilder overloads.I've quoted the StringBuilder as a possible indication that boxing might avoid garbage (on the basis that avoiding the array *might* not only be for performance reasons but also to reduce unnecessary garbage)
Cygon