I'm looking for a clear, concise and accurate answer.
Ideally as the actual answer, although links to good explanations welcome.
I'm looking for a clear, concise and accurate answer.
Ideally as the actual answer, although links to good explanations welcome.
Boxing & unboxing is the process of converting a primitive value into an object oriented wrapper class (boxing), or converting a value from an object oriented wrapper class back to the primitive value (unboxing).
For example, in java, you may need to convert an int
value into an Integer
(boxing) if you want to store it in a Collection
because primitives can't be stored in a Collection
, only objects. But when you want to get it back out of the Collection
you may want to get the value as an int
and not an Integer
so you would unbox it.
Boxing and unboxing is not inherently bad, but it is a tradeoff. Depending on the language implementation, it can be slower and more memory intensive than just using primitives. However, it may also allow you to use higher level data structures and achieve greater flexibility in your code.
These days, it is most commonly discussed in the context of Java's (and other language's) "autoboxing/autounboxing" feature. Here is a java centric explanation of autoboxing.
In .Net:
Often you can't rely on what the type of variable a function will consume, so you need to use an object variable which extends from the lowest common denominator - in .Net this is object
.
However object
is a class and stores its contents as a reference.
List<int> notBoxed = new List<int> { 1, 2, 3 };
int i = notBoxed[1]; // this is the actual value
List<object> boxed = new List<object> { 1, 2, 3 };
int j = (int) boxed[1]; // this is an object that can be 'unboxed' to an int
While both these hold the same information the second list is larger and slower. Each value in the second list is actually a reference to an object that holds the int.
This is called boxed because the int is wrapped by the object. When its cast back the int is unboxed - converted back to it's value.
For value types (i.e. all structs
) this is slow, and potentially uses a lot more space.
For reference types (i.e. all classes
) this is far less of a problem, as they are stored as a reference anyway.
Thanks for the clarification @Justin Standard
from C# 3.0 In a Nutshell:
Boxing is the act of casting a value type into a reference type:
int x = 9;
object o = x; // boxing the int
unboxing is... the reverse:
object o = 9; int x = (int)o; // unboxing o
The .NET FCL generic collections:
List<T>
Dictionary<TKey, UValue>
SortedDictionary<TKey, UValue>
Stack<T>
Queue<T>
LinkedList<T>
were all designed to overcome the performance issues of boxing and unboxing in previous collection implementations.
For more, see chapter 16, CLR via C# (2nd Edition).
Boxed values are data structures that are minimal wrappers around primitive types*. Boxed values are typically stored as pointers to objects on the heap.
Thus, boxed values use more memory and take at minimum two memory lookups to access: once to get the pointer, and another to follow that pointer to the primitive. Obviously this isn't the kind of thing you want in your inner loops. On the other hand, boxed values typically play better with other types in the system. Since they are first-class data structures in the language, they have the expected metadata and structure that other data structures have.
In Java and Haskell generic collections can't contain unboxed values. Generic collections in .NET can hold unboxed values with no penalties. Where Java's generics are only used for compile-time type checking, .NET will generate specific classes for each generic type instantiated at run time.
Java and Haskell have unboxed arrays, but they're distinctly less convenient than the other collections. However, when peak performance is needed it's worth a little inconvenience to avoid the overhead of boxing and unboxing.
* For this discussion, a primitive value is any that can be stored on the call stack, rather than stored as a pointer to a value on the heap. Frequently that's just the machine types (ints, floats, etc), structs, and sometimes static sized arrays. .NET-land calls them value types (as opposed to reference types). Java folks call them primitive types. Haskellions just call them unboxed.
** I'm also focusing on Java, Haskell, and C# in this answer, because that's what I know. For what it's worth, Python, Ruby, and Javascript all have exclusively boxed values. This is also known as the "Everything is an object" approach.
Like anything else, autoboxing can be problematic if not used carefully. The classic is to end up with a NullPointerException and not be able to track it down. Even with a debugger. Try this:
public class TestAutoboxNPE
{
public static void main(String[] args)
{
Integer i = null;
// .. do some other stuff and forget to initialise i
i = addOne(i); // Whoa! NPE!
}
public static int addOne(int i)
{
return i + 1;
}
}