views:

175

answers:

6

I am in a classic design dilemma. I am writing a C# data structure for containing a value and measurement unit tuple (e.g. 7.0 millimeters) and I am wondering if I should use a reference type or a value type.

The benefits of a struct should be less heap action giving me better performance in expressions and less stress on the garbage collector. This would normally be my choice for a simple type like this, but there are drawbacks in this concrete case.

The tuple is part of a rather general analysis result framework where the results are presented in different ways in a WPF application depending on the type of the result value. This kind of weak typing is handled exceptionally well by WPF with all it's data templates, value converts and template selectors. The implication is that the value will undergo a lot of boxing / unboxing if my tuple is represented as a struct. In fact the use of the tuple in expressions will be minor to the use in boxing scenarios. To avoid all the boxing I consider declaring my type as a class. Another worry about a struct is that there could be pitfalls with two-way binding in WPF, since it would be easier to end up with copies of the tuples somewhere in the code rather than reference copies.

I also have some convenient operator overloading. I am able to compare say millimeters with centimeters without problems using overloaded comparison operators. However I don't like the idea of overloading == and != if my tuple is a class, since the convention is that == and != is ReferenceEquals for reference types (unlike System.String, which is another classic discussion). If == and != is overloaded, someone will write if (myValue == null) and get a nasty runtime exception when myValue one day turn out to be null.

Yet another aspect is that there is no clear way in C# (unlike in e.g. C++) to distinguish reference and value types in code usages, yet the semantics are very different. I worry that users of my tuple (if declared struct) assumes that the type is a class, since most custom data structures are and assumes reference semantics. That is another argument why one should prefer classes simply because thats what the user expects and there are no "." / "->" to tell them apart. In general I would almost always use a class unless my profiler tells me to use a struct, simply because class semantics is the most likely expected by fellow programmers and C# has only vague hints whether it is one thing or the other.

So my questions are:

What other considerations should I weigh in when deciding if I should go value or reference?

Can == / != overloading in a class be justified in any circumstances?

Programmers assume stuff. Most would probably assume that something called a "Point" is a value type. What would you assume if you read some code with a "UnitValue"?

What would you choose given my usage description?

+1  A: 

However I don't like the idea of overloading == and != if my tuple is a class, since the convention is that == and != is ReferenceEquals for reference types

No, the convention is slightly different:

(unlike System.String, which is another classic discussion).

Nah, it’s the same discussion.

The crux is not whether a type is a reference type. – It’s whether the type behaves as a value. This is true for String, and this should be true for any class for which you care to overload operator == and !=.

There’s only one thing that you should take care of when designing a type that is logically a value: make it immutable (see other discussions here on Stack Overflow), and implement the comparison semantics properly:

If == and != is overloaded, someone will write if (myValue == null) and get a nasty runtime exception when myValue one day turn out to be null.

There should be no exception (after all, (string)null == null doesn’t yield an exception either!), this would be a bug in the implementation of the overloaded operator.

Konrad Rudolph
A: 

Maybe you can get some inspiration from this recent blog post by Eric Lippert. Most important thing to remember when using structs is to make them immutable. Here's an interesting blog post by Jon Skeet where a mutable struct can lead to very hard to debug problems.

Ronald Wildenberg
I was just thinking about this...again. We are going to make this the most popular post ever :)
Bryan
I actually think this is in danger of becoming one of those guidelines treated as a rule, as there are places where mutable structs can make sense. Likewise, immutable reference types are relatively undervalued. Still, it is true for most structs, and in the question here immutability is more of a no-brainer than value vs. reference type.
Jon Hanna
A: 

I am not sure that performance penalty of boxing/unboxing your value in the UI code should be your main concern here. This perf hit will be minor compared to the layout process for example.

In fact you could formulate your question an other way: Do you want your type to be mutable or immutable? I think immutability would be logical with your specs. It's a value, you said it yourself, by naming it UnitValue. As a developper, I would be rather surprised that an UnitValue is not a value ;) => Use an immutable struct

Furthermore, null does not have any sense for a measurement. Equality and comparaison too should to be implemented following measurement rules.

No, I don't see pertinent reason to use a ref type rather than a value type in your case.

Maupertuis
The types are used in other places than the UI, but it needs to be stored in an "object" for the UI presentation. My worry is that someone will use the type in performance heavy expressions in the future.
Holstebroe
@Holstebroe, but if someone is doing something performance heavy with it, why would the access it as an object and deal with all that boxing and unboxing? If something turned out to be performance-heavy then they'd look at that, change it to work on the type of the struct directly, and the problem is gone and possibly (depending on what that performance-heavy thing was) gain over reference types.
Jon Hanna
+8  A: 

The benefits of a struct should be less heap action giving me better performance in expressions and less stress on the garbage collector

Given without any context, this is a vast--and dangerous--overgeneralization. A struct is not automatically eligible for the stack. A struct can be placed on the stack if (and only if) its lifetime and exposure does not extend outside of the function that's declaring it, it doesn't get boxed within that function, and probably a host of other criteria that don't come to mind immediately. This means that making it part of an lambda expression or delegate means that it's going to be stored on the heap anyway. The point is not to worry about it, because there's a 99.9% chance that your bottlenecks are somewhere else.

As for operator overloading, there's nothing stopping you (either technically or philosophically) from overloading operators on your type. While you're technically correct in that equality comparisons between reference types are, by default, semantically equivalent to object.ReferenceEquals, this is not a be-all and end-all rule. There are two basic things to keep in mind about operator overloading:

1.) (And this may be the most important from a practical perspective) Operators are not polymorphic. That is, you will only use operators defined on the types as they are referenced, not as they actually exist.

For example, if I declare a type Foo that defines an overloaded equals operator that always returns true, then I do this:

Foo foo1 = new Foo();
Foo foo2 = new Foo();
object obj1 = foo1;

bool compare1 = foo1 == foo2; // true
bool compare2 = foo1 == obj1; // false

Even though obj1 is, in reality, an instance of Foo, the overloaded operator doesn't exist at the type hierarchy level that I'm referencing the instance stored in the obj1 reference, so it falls back to reference comparison.

2.) Comparison operations should be deterministic. It should not be possible to compare the same two instances using the overloaded operator and be able to yield differing results. Practically, this sort of requirement usually results in the types being immutable (since being able to tell the difference between one or more values in a class yet getting true from an equals operator is rather counterintuitive), but fundamentally it just means that you should not be able to alter a state value within an instance that will alter the result of a comparison operation. If it makes sense in your scenario to be able to mutate some of the instance state information without having it affect the result of a comparison, then there's no reason you shouldn't. That's just a rare case.

Adam Robinson
A: 

In my opinion, your design calls for value type semantics for your tuple. <7.0, mm> should always be equal to <7.0, mm> from a programmers point of view. <7.0, mm> is exactly the sum of its parts and has no own identiy. Everything else I would find very confusing. This kind if implies immutability as well.

Now, if you implement this with structs or classes depends on performance and if you have to support null values for every tuple. If you go for structs you can get away with Nullable if you only need to support null in a few cases.

Also, can't you provide a reference type wrapper for your tuples, which is used for display purposes? I am not to familiar with WPF, but I would imagine that this would eliminate all of the boxing operations.

TheFogger
A: 

data structure for containing a value and measurement unit tuple (e.g. 7.0 millimeters)

Sounds like it has value semantics. The framework provides a mechanism for creating types with value semantics, namely struct. Use that.

Almost everything you say in the next paragraph in your question, both pro and con value-types is a matter of optimising based on how it will interact with for the implementation details of the runtime. Since there are both pros and cons in this regard, there's no clear efficiency winner. Since you can't find the clear efficiency winner without actually trying it, any attempt to optimise in this regard will clearly be premature. As much as I'm sick to death of that quote about premature optimisation being bandied about the moment somebody tries to make something faster or smaller, it does apply here.

One thing though that isn't about optimisation:

I don't like the idea of overloading == and != if my tuple is a class, since the convention is that == and != is ReferenceEquals for reference types

Not true at all. The default is that == and != deal with reference equality, but that's as much because the it's the only meaningful default without more knowledge of the semantics of the class. == and != should be overloaded when it fits a classes semantics to do so, ReferenceEquals should be used when reference equality is the only thing one cares about.

If == and != is overloaded, someone will write if (myValue == null) and get a nasty runtime exception when myValue one day turn out to be null.

Only if the == overload has a newbie mistake. The normal approach would be:

public static bool operator == (MyType x, MyType y)
{
  if(ReferenceEquals(x, null))
    return ReferenceEquls(y, null);
  if(ReferenceEquals(y, null))
    return false;
  return x.Equals(y);
}

And of course, the Equals overload should also check of the parameter being null and return false if it is, for people calling it directly. There isn't even a significant performance impact on calling this over the default == behaviour when one or both values are null, so what's the concern?

Yet another aspect is that there is no clear way in C# (unlike in e.g. C++) to distinguish reference and value types in code usages, yet the semantics are very different.

Not really. The default semantics as far as equality goes are pretty different, but since you are describing something as intending to have value semantics, that leans toward having it as a value type, rather than as a class type. Beyond that, the semantics available are much the same. The mechanisms can differ as far as boxing, reference sharing and so-on go, but that's back to optimisation again.

Can == / != overloading in a class be justified in any circumstances?

I would rather ask, can not overloading == and != be justified when that's the sensible thing to do for the class?

As for what I as a programmer would assume about "UnitValue", I'd probably assume it was a struct, since it sounds like it should be. But actually, I wouldn't even assume that, as I mostly won't care until I do something with it where it's important, which given that it also sounds like it should be immutable, is a reduced set (the semantic differences between mutable reference types and mutable structs are greater in practice, but this one is a no-brainer immutable).

Jon Hanna