views:

1882

answers:

3

I have a class like this:

class MyClass<T> {
    public string value1 { get; set; }
    public T objT { get; set; }
}

and a list of this class. I would like to use .net 3.5 lambda or linq to get a list of MyClass by distinct value1. I guess this is possible and much simpler than the way in .net 2.0 to cache a list like this:

List<MyClass<T>> list; 
...
List<MyClass<T>> listDistinct = new List<MyClass<T>>();
foreach (MyClass<T> instance in list)
{
    // some code to check if listDistinct does contain obj with intance.Value1
    // then listDistinct.Add(instance);
}

What is the lambda or LINQ way to do it?

+2  A: 

Hmm... I'd probably write a custom IEqualityComparer<T> so that I can use:

var listDistinct = list.Distinct(comparer).ToList();

and write the comparer via LINQ....

Possibly a bit overkill, but reusable, at least:

Usage first:

static class Program {
    static void Main() {
        var data = new[] {
            new { Foo = 1,Bar = "a"}, new { Foo = 2,Bar = "b"}, new {Foo = 1, Bar = "c"}
        };
        foreach (var item in data.DistinctBy(x => x.Foo))
            Console.WriteLine(item.Bar);
        }
    }
}

With utility methods:

public static class ProjectionComparer
{
    public static IEnumerable<TSource> DistinctBy<TSource,TValue>(
        this IEnumerable<TSource> source,
        Func<TSource, TValue> selector)
    {
        var comparer = ProjectionComparer<TSource>.CompareBy<TValue>(
            selector, EqualityComparer<TValue>.Default);
        return new HashSet<TSource>(source, comparer);
    }
}
public static class ProjectionComparer<TSource>
{
    public static IEqualityComparer<TSource> CompareBy<TValue>(
        Func<TSource, TValue> selector)
    {
        return CompareBy<TValue>(selector, EqualityComparer<TValue>.Default);
    }
    public static IEqualityComparer<TSource> CompareBy<TValue>(
        Func<TSource, TValue> selector,
        IEqualityComparer<TValue> comparer)
    {
        return new ComparerImpl<TValue>(selector, comparer);
    }
    sealed class ComparerImpl<TValue> : IEqualityComparer<TSource>
    {
        private readonly Func<TSource, TValue> selector;
        private readonly IEqualityComparer<TValue> comparer;
        public ComparerImpl(
            Func<TSource, TValue> selector,
            IEqualityComparer<TValue> comparer)
        {
            if (selector == null) throw new ArgumentNullException("selector");
            if (comparer == null) throw new ArgumentNullException("comparer");
            this.selector = selector;
            this.comparer = comparer;
        }

        bool IEqualityComparer<TSource>.Equals(TSource x, TSource y)
        {
            if (x == null && y == null) return true;
            if (x == null || y == null) return false;
            return comparer.Equals(selector(x), selector(y));
        }

        int IEqualityComparer<TSource>.GetHashCode(TSource obj)
        {
            return obj == null ? 0 : comparer.GetHashCode(selector(obj));
        }
    }
}
Marc Gravell
one question about codes: what is ProjectionComparer? A .Net class or LINQ or IEnumerable related class so that you can have customized extension?
David.Chu.ca
OK. I think that "ProjectionComparer" is any class name you defined, but within the class you have customized extension method DistinctBy() to IEnumerable, and ProjectionComparer<T> is another helper class, right? Can ProjectionComparer<T> be a different name, instead of the same name?
David.Chu.ca
If I want to get a list of value1 of MyClass, I can use this comparer like this: List<string> listValue1s = list.Distinct(comparer).ToList().Select(y => y.value1); Is that right?
David.Chu.ca
The name of ProjectionComparer doesn't matter - you could call it EnumerableExtensions. ProjectionComparer<T> is so named because it provides a Comparer through projection, the common term for getting a new value based on an existing one (value1 from a MyClass, for example). And for your last question: Don't call ToList() unless you need to. If you're not going to use the distinct list of MyClass<T> objects, then you're better off getting your value1's like this: IEnumerable<string> value1s = list.Select(y => y.value1).Distinct();
dahlbyk
thanks dahlbyk; saved me some typing ;-p
Marc Gravell
Marc, do you have any comments on jpbochi's alternative method? It seems no need to write an Comparer extension class, and much flexible. For the case of LINQ-to-Object, it seems to be good enough.
David.Chu.ca
They are the same, in essence, except mine can be used with **any** object, rather than just one specific case.
Marc Gravell
A: 

Check out Enumerable.Distinct(), which can accept an IEqualityComparer:

class MyClassComparer<T> : IEqualityComparer<MyClass<T>>
{
    // Products are equal if their names and product numbers are equal.
    public bool Equals(MyClass<T> x, MyClass<T>y)
    {
        // Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        // Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        // Check whether the products' properties are equal.
        return x.value1 == y.value1;
    }

    // If Equals() returns true for a pair of objects,
    // GetHashCode must return the same value for these objects.

    public int GetHashCode(MyClass<T> x)
    {
        // Check whether the object is null.
        if (Object.ReferenceEquals(x, null)) return 0;

        // Get the hash code for the Name field if it is not null.
        return (x.value1 ?? "").GetHashCode();
    }
}

Your code snippet could look like this:

List<MyClass<T>> list; 
...
List<MyClass<T>> listDistinct = list.Distinct(new MyClassComparer<T>).ToList();
dahlbyk
Hello dahlbyk, thanks for your codes and comments on Marc's post! jpbochi provides an alternative way without writing an extension class. Any comments?
David.Chu.ca
I think each approach has merit. The group-by approach requires the least code and can be more flexible, but has a (slight) performance penalty and at a glance the purpose of the code is not as immediately obvious. Marc's general solution reads quite fluently, but some might say that single expression does too much: it both specifies how items are compared and does the actual select-distinct. My approach is more specific, but provides a clear separation between the equivalence logic and the operation(s) that leverage it.
dahlbyk
Thanks for you complete comments. I agree with you on readability and separation. However, in terms of flexibility to get distinct of instance of T by the second or last, the Comparer only gets the first and it might be complexed to the same flexibility in, right? See my comments on jpbochi.
David.Chu.ca
Indeed the Distinct-with-Comparer approach would only return the first in the "set". However, I think the semantics of "Distinct" are that the objects should be considered equivalent if they match by your criteria. Once you start picking the First or Last, you've really moved out of a "Distinct" calculation into some sort of aggregation (First, Last, Min, whatever) on a grouping.
dahlbyk
+5  A: 

Both Marc's and dahlbyk's answers seems to work very well. I have a much simpler solution though. Instead of using Distinct, you can use GroupBy. It goes like this:

var listDistinct
    = list.GroupBy(
     i => i.value1,
     (key, group) => group.First()
    ).ToArray();

Notice that I've passed two functions to the GroupBy(). The first is a key selector. The second gets only one item from each group. From your question, I assumed First() was the right one. You can write a different one, if you want to. You can try Last() to see what I mean.

I ran a test with the following input:

var list = new [] {
    new { value1 = "ABC", objT = 0 },
    new { value1 = "ABC", objT = 1 },
    new { value1 = "123", objT = 2 },
    new { value1 = "123", objT = 3 },
    new { value1 = "FOO", objT = 4 },
    new { value1 = "BAR", objT = 5 },
    new { value1 = "BAR", objT = 6 },
    new { value1 = "BAR", objT = 7 },
    new { value1 = "UGH", objT = 8 },
};

The result was:

//{ value1 = ABC, objT = 0 }
//{ value1 = 123, objT = 2 }
//{ value1 = FOO, objT = 4 }
//{ value1 = BAR, objT = 5 }
//{ value1 = UGH, objT = 8 }

I haven't tested it for performance. I believe that this solution is probably a little bit slower than one that uses Distinct. Despite this disadvantage, there are two great advantages: simplicity and flexibility. Usually, it better to favor simplicity over optimization, but it really depends on the problem you're trying to solve.

jpbochi
Very interesting. Actually, the Comparer method has one limitation: it only returns the distinct one by the first found. If I need the flexibility get the distinct by the second, ..., or the last, not sure if group.xxx() would be able to do it?
David.Chu.ca
Yes, you would. Simply replace `First()` for `Last()` and see. Of course you can make any other complex selection if you need it.
jpbochi
Was your problem solved? I noticed that you didn't accepted any answers yet...
jpbochi
@David: You should consider to accept this answer. It is a flexible and elegant solution to your problem.
Carlos Loth