views:

289

answers:

9

[Note: This question had the original title "C (ish) style union in C#" but as Jeff's comment informed me, apparently this structure is called a 'discriminated union']

Excuse the verbosity of this question.

There are a couple of similar sounding questions to mine already in SO but they seem to concentrate on the memory saving benefits of the union or using it for interop. Here is an example of such a question.

[http://stackoverflow.com/questions/126781/c-union-in-c]

My desire to have a union type thing is somewhat different.

I am writing some code at the moment which generates objects that look a bit like this

public class ValueWrapper
{
    public DateTime ValueCreationDate;
    // ... other meta data about the value

    public object ValueA;
    public object ValueB;
}

Pretty complicated stuff I think you will agree. The thing is that ValueA can only be of a few certain types (let's say string, int and Foo (which is a class) and ValueB can be another small set of types. I don't like treating these values as objects (I want the warm snugly feeling of coding with a bit of type safety).

So I thought about writing a trivial little wrapper class to express the fact that ValueA logically is a reference to a particular type. I called the class Union because what I am trying to achieve reminded me of the union concept in C.

public class Union<A, B, C>
{
    private readonly Type type; 
    public readonly A a;
    public readonly B b;
    public readonly C c;

    public A A{get {return a;}}
    public B B{get {return b;}}
    public C C{get {return c;}}

    public Union(A a)
    {
        type = typeof(A);
        this.a = a;
    }

    public Union(B b)
    {
        type = typeof(B);
        this.b = b;
    }

    public Union(C c)
    {
        type = typeof(C);
        this.c = c;
    }

    /// <summary>
    /// Returns true if the union contains a value of type T
    /// </summary>
    /// <remarks>The type of T must exactly match the type</remarks>
    public bool Is<T>()
    {
        return typeof(T) == type;
    }

    /// <summary>
    /// Returns the union value cast to the given type.
    /// </summary>
    /// <remarks>If the type of T does not exactly match either X or Y, then the value <c>default(T)</c> is returned.</remarks>
    public T As<T>()
    {
        if(Is<A>())
        {
            return (T)(object)a;    // Is this boxing and unboxing unavoidable if I want the union to hold value types and reference types? 
            //return (T)x;          // This will not compile: Error = "Cannot cast expression of type 'X' to 'T'."
        }

        if(Is<B>())
        {
            return (T)(object)b; 
        }

        if(Is<C>())
        {
            return (T)(object)c; 
        }

        return default(T);
    }
}

Using this class ValueWrapper now looks like this

public class ValueWrapper2
{
    public DateTime ValueCreationDate;
    public  Union<int, string, Foo> ValueA;
    public  Union<double, Bar, Foo> ValueB;
}

which is something like what I wanted to achieve but I am missing one fairly crucial element - that is compiler enforced type checking when calling the Is and As functions as the following code demonstrates

    public void DoSomething()
    {
        if(ValueA.Is<string>())
        {
            var s = ValueA.As<string>();
            // .... do somethng
        }

        if(ValueA.Is<char>()) // I would really like this to be a compile error
        {
            char c = ValueA.As<char>();
        }
    }

IMO It is not valid to ask ValueA if it is a char since its definition clearly says it is not - this is a programming error and I would like the compiler to pick up on this. [Also if I could get this correct then (hopefully) I would get intellisense too - which would be a boon.]

In order to achieve this I would want to tell the compiler that the type T can be one of A, B or C

    public bool Is<T>() where T : A 
                           or T : B // Yes I know this is not legal!
                           or T : C 
    {
        return typeof(T) == type;
    } 

Does anyone have any idea if what I want to achieve is possible? Or am I just plain stupid for writing this class in the first place?

Thanks in advance.

+5  A: 

Well, I don't think it's stupid to strive for greater type safety, but I do think you're on the wrong track. What about simply creating getters and setters instead of using public fields?

For example:

public object ValueA { get; private set; }

public void SetValueA( int value ) { ValueA = value; }
public void SetValueA( string value ) { ValueA = value; }
public void SetValueA( Foo value ) { ValueA = value; }
Peter Ruderman
I like this idea. Usually, having a hard time doing something is indicative of the fact that you are not doing it in the way that's natural for the language. I think this is a more natural approach for C#.
jdmichal
@Peter. I shied away from this solution since it would require 3 setters and 3 getters for ValueA and another 3 + 3 for ValueB (plus I would still need a method to determine which setter had been called in the case where 2 or more of the possible types were reference types.
Chris F
@jdmichal. I agree that usually if it seems unduly difficult there is often an easier solution that has been overlooked (like that proposed by Jaraslav) but I personally think this solution is a little too inelegant
Chris F
+1  A: 

You could throw exceptions once there's an attempt to access variables that haven't been initialized, ie if it's created with an A parameter and later on there's an attempt to access B or C, it could throw, say, UnsupportedOperationException. You'd need a getter to make it work though.

mr popo
Yes - the first version that I wrote did raise exception in the As method - but whilst this certainly highlights the problem in the code, I much prefer to be told about this at compile time than at runtime.
Chris F
+2  A: 
char foo = 'B';

bool bar = foo is int;

This results in a warning, not an error. If you're looking for your Is and As functions to be analogs for the C# operators, then you shouldn't be restricting them in that way anyhow.

Adam Robinson
+3  A: 

I am not sure I fully understand your goal. In C, a union is a structure that uses the same memory locations for more than one field. For example:

typedef union
{
    float real;
    int scalar;
} floatOrScalar;

The floatOrScalar union could be used as a float, or an int, but they both consume the same memory space. Changing one changes the other. You can achieve the same thing with a struct in C#:

[StructLayout(LayoutKind.Explicit)]
struct FloatOrScalar
{
    [FieldOffset(0)]
    public float Real;
    [FieldOffset(0)]
    public int Scalar;
}

The above structure uses 32bits total, rather than 64bits. This is only possible with a struct. Your example above is a class, and given the nature of the CLR, makes no guarantee about memory efficiency. If you change a Union<A, B, C> from one type to another, you are not necessarily reusing memory...most likely, you are allocating a new type on the heap and dropping a different pointer in the backing object field. Contrary to a real union, your approach may actually cause more heap thrashing than you would otherwise get if you did not use your Union type.

jrista
As I mentioned in my question, my motivation was not better memory efficiency. I have changed the question title to better reflect what my goal is - the original title of "C(ish) union" is in hindsight misleading
Chris F
A discriminated union makes a whole lot more sense for what you are trying to do. As for making it compile-time checked...I would look into .NET 4 and Code Contracts. With Code Contracts, it may be possible to enforce a compile-time Contract.Requires that enforces your requirements on the .Is<T> operator.
jrista
I guess I still have to question the use of a Union, in general practice. Even in C/C++, unions are a risky thing, and must be used with extreme care. I am curious why you need to bring such a construct into C#...what value do you perceive getting out of it?
jrista
+2  A: 

If you allow multiple types, you cannot achieve type safety (unless the types are related).

You can't and won't achieve any kind of type safety, you could only achieve byte-value-safety using FieldOffset.

It would make much more sense to have a generic ValueWrapper<T1, T2> with T1 ValueA and T2 ValueB, ...

P.S.: when talking about type-safety I mean compile-time type-safety.

If you need a code wrapper (performing bussiness logic on modifications you can use something along the lines of:

public class Wrapper
{
    public ValueHolder<int> v1 = 5;
    public ValueHolder<byte> v2 = 8;
}

public struct ValueHolder<T>
    where T : struct
{
    private T value;

    public ValueHolder(T value) { this.value = value; }

    public static implicit operator T(ValueHolder<T> valueHolder) { return valueHolder.value; }
    public static implicit operator ValueHolder<T>(T value) { return new ValueHolder<T>(value); }
}

For an easy way out you could use (it has performance issues, but it is very simple):

public class Wrapper
{
    private object v1;
    private object v2;

    public T GetValue1<T>() { if (v1.GetType() != typeof(T)) throw new InvalidCastException(); return (T)v1; }
    public void SetValue1<T>(T value) { v1 = value; }

    public T GetValue2<T>() { if (v2.GetType() != typeof(T)) throw new InvalidCastException(); return (T)v2; }
    public void SetValue2<T>(T value) { v2 = value; }
}

//usage:
Wrapper wrapper = new Wrapper();
wrapper.SetValue1("aaaa");
wrapper.SetValue2(456);

string s = wrapper.GetValue1<string>();
DateTime dt = wrapper.GetValue1<DateTime>();//InvalidCastException
Jaroslav Jandek
Your suggestion of making ValueWrapper generic seems like the obvious answer but it causes me problems in what I am doing. Essentially, my code is creating these wrapper objects by parsing some text line. So I have a method like ValueWrapper MakeValueWrapper(string text). If I make the wrapper generic then I need to change the signature of MakeValueWrapper to be generic and then this in turns means that the calling code need to know what types are expected and I just don't know this in advance before I parse the text...
Chris F
...but even as I was writing the last comment, it felt like I have perhaps missed something (or messed up something) because what I am trying to do does not feel as it should be as difficult as I am making it. I think I will go back and spend a few minutes working on a generified wrapper and see if I can adapt the parsing code around it.
Chris F
The code I have provided is supposed to be just for bussiness logic.The problem with your approach is that you never know what value is stored in the Union at compile-time. It means you will have to use if or switch statements whenever you access the Union object, since those objects do not share a common functionality! How are you going to use the wrapper objects further in your code? Also you can construct generic objects at runtime (slow, but possible). Another easy option with is in my edited post.
Jaroslav Jandek
You have basically no meaningful compile-time type checks in your code right now - you could also try dynamic objects (dynamic type checking at runtime).
Jaroslav Jandek
A: 

You can export a pseudo-pattern matching function, like I use for the Either type in my Sasa library. There's currently runtime overhead, but I eventually plan to add a CIL analysis to inline all the delegates into a true case statement.

naasking
A: 

It's not possible to do with exactly the syntax you've used but with a bit more verbosity and copy/paste it's easy to make overload resolution do the job for you:


// this code is ok
var u = new Union("");
if (u.Value(Is.OfType()))
{
    u.Value(Get.ForType());
}

// and this one will not compile
if (u.Value(Is.OfType()))
{
    u.Value(Get.ForType());
}

By now it should be pretty obvious how to implement it:


    public class Union
    {
        private readonly Type type;
        public readonly A a;
        public readonly B b;
        public readonly C c;

        public Union(A a)
        {
            type = typeof(A);
            this.a = a;
        }

        public Union(B b)
        {
            type = typeof(B);
            this.b = b;
        }

        public Union(C c)
        {
            type = typeof(C);
            this.c = c;
        }

        public bool Value(TypeTestSelector _)
        {
            return typeof(A) == type;
        }

        public bool Value(TypeTestSelector _)
        {
            return typeof(B) == type;
        }

        public bool Value(TypeTestSelector _)
        {
            return typeof(C) == type;
        }

        public A Value(GetValueTypeSelector _)
        {
            return a;
        }

        public B Value(GetValueTypeSelector _)
        {
            return b;
        }

        public C Value(GetValueTypeSelector _)
        {
            return c;
        }
    }

    public static class Is
    {
        public static TypeTestSelector OfType()
        {
            return null;
        }
    }

    public class TypeTestSelector
    {
    }

    public static class Get
    {
        public static GetValueTypeSelector ForType()
        {
            return null;
        }
    }

    public class GetValueTypeSelector
    {
    }

There are no checks for extracting the value of the wrong type, e.g.:


var u = Union(10);
string s = u.Value(Get.ForType());

So you might consider adding necessary checks and throw exceptions in such cases.

Konstantin Oznobihin
A: 

Here is my attempt. It does compile time checking of types, using generic type constraints.

class Union {
    public interface AllowedType<T> { };

    internal object val;

    internal System.Type type;
}

static class UnionEx {
    public static T As<U,T>(this U x) where U : Union, Union.AllowedType<T> {
        return x.type == typeof(T) ?(T)x.val : default(T);
    }

    public static void Set<U,T>(this U x, T newval) where U : Union, Union.AllowedType<T> {
        x.val = newval;
        x.type = typeof(T);
    }

    public static bool Is<U,T>(this U x) where U : Union, Union.AllowedType<T> {
        return x.type == typeof(T);
    }
}

class MyType : Union, Union.AllowedType<int>, Union.AllowedType<string> {}

class TestIt
{
    static void Main()
    {
        MyType bla = new MyType();
        bla.Set(234);
        System.Console.WriteLine(bla.As<MyType,int>());
        System.Console.WriteLine(bla.Is<MyType,string>());
        System.Console.WriteLine(bla.Is<MyType,int>());

        bla.Set("test");
        System.Console.WriteLine(bla.As<MyType,string>());
        System.Console.WriteLine(bla.Is<MyType,string>());
        System.Console.WriteLine(bla.Is<MyType,int>());

        // compile time errors!
        // bla.Set('a'); 
        // bla.Is<MyType,char>()
    }
}

It could use some prettying-up. Especially, I couldn't figure out how to get rid of the type parameters to As/Is/Set (isn't there a way to specify one type parameter and let C# figure the other one?)

Amnon
+4  A: 

I don't really like the type-checking and type-casting solutions provided above, so here's 100% type-safe union which will throw compilation errors if you attempt to use the wrong datatype:

using System;

namespace Juliet
{
    class Program
    {
        static void Main(string[] args)
        {
            Union3<int, char, string>[] unions = new Union3<int,char,string>[]
                {
                    new Union3<int, char, string>.Case1(5),
                    new Union3<int, char, string>.Case2('x'),
                    new Union3<int, char, string>.Case3("Juliet")
                };

            foreach (Union3<int, char, string> union in unions)
            {
                string value = union.Match(
                    num => num.ToString(),
                    character => new string(new char[] { character }),
                    word => word);
                Console.WriteLine("Matched union with value '{0}'", value);
            }

            Console.ReadLine();
        }
    }

    public abstract class Union3<A, B, C>
    {
        public abstract T Match<T>(Func<A, T> f, Func<B, T> g, Func<C, T> h);

        public sealed class Case1 : Union3<A, B, C>
        {
            public readonly A Item;
            public Case1(A item) : base() { this.Item = item; }
            public override T Match<T>(Func<A, T> f, Func<B, T> g, Func<C, T> h)
            {
                return f(Item);
            }
        }

        public sealed class Case2 : Union3<A, B, C>
        {
            public readonly B Item;
            public Case2(B item) { this.Item = item; }
            public override T Match<T>(Func<A, T> f, Func<B, T> g, Func<C, T> h)
            {
                return g(Item);
            }
        }

        public sealed class Case3 : Union3<A, B, C>
        {
            public readonly C Item;
            public Case3(C item) { this.Item = item; }
            public override T Match<T>(Func<A, T> f, Func<B, T> g, Func<C, T> h)
            {
                return h(Item);
            }
        }
    }
}
Juliet
Yup, if you want typesafe discriminated unions, you'll need `match`, and that's as good a way to get it as any.
Pavel Minaev
And if all that boilerplate code gets you down, you can try this implementation which explicitly tags cases instead: http://pastebin.com/EEdvVh2R . Incidentally this style is very similar to the way F# and OCaml represent unions internally.
Juliet
Interesting. I much prefer the version in pastebin - having to explicitly use Case1, 2, 3 above seemed redundant since it can be inferred from the type of the constructor argument.
Chris F
Even though the Match function provides a way of replacing the "if(Is<A>) then do something" code in a typesafe manner I would still want accessors to get at the underlying item (similar to the As methods in my example), but I suppose I could just settle for 3 getters; AsA, AsB and AsC (using Match to access the items just seems too long winded). I would still prefer the As method to be generic as in my example (I just like the syntax) but I cannot have my cake and eat it I suppose.
Chris F
I like Juliet's shorter code, but what if the types are <int, int, string>? How would you call the second constructor?
Robert Jeppesen
@Robert Jeppensen: then you use the first code ;)
Juliet