tags:

views:

978

answers:

5

I have some objects:

class Foo {
    public Guid id;
    public string description;
}

var list = new List<Foo>();
list.Add(new Foo() { id = Guid.Empty, description = "empty" });
list.Add(new Foo() { id = Guid.Empty, description = "empty" });
list.Add(new Foo() { id = Guid.NewGuid(), description = "notempty" });
list.Add(new Foo() { id = Guid.NewGuid(), description = "notempty2" });

I would like to process this list in such a way that the id field is unique, and throw away the non-unique objects (based on id).

The best I could come up with is:

list = list.GroupBy(i => i.id).Select(g=>g.First()).ToList();

Is there a nicer/better/quicker way to achieve the same result.

+1  A: 

Create an IEqualityComparer<Foo> which returns true if the id fields are the same, and pass that to the Distinct() operator.

itowlson
I saw that but i find it messier cause it involves defining a new class
Sam Saffron
But it's more intention-revealing. Anyone reading the code can clearly see that it extracts distinct elements from the sequence; GroupBy-based solutions seem misleading in that they imply you are interested in grouping. I do wish I could use a comparison lambda instead of a whole class though!
itowlson
Thats true, i guess you can work around it with some smart extension classes that create the IEqualityComparer on the fly
Sam Saffron
ok, see my answer ...
Sam Saffron
A: 
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var list = new List<Foo>();
            list.Add(new Foo() { id = Guid.Empty, description = "empty" });
            list.Add(new Foo() { id = Guid.Empty, description = "empty" });
            list.Add(new Foo() { id = Guid.NewGuid(), description = "notempty" });
            list.Add(new Foo() { id = Guid.NewGuid(), description = "notempty2" });

            var unique = from l in list
                         group l by new { l.id, l.description } into g
                         select g.Key;
            foreach (var f in unique)
                Console.WriteLine("ID={0} Description={1}", f.id,f.description);
            Console.ReadKey(); 
        }
    }

    class Foo
    {
        public Guid id;
        public string description;
    }
}
David Leon
+5  A: 

A very elegant and intention revealing option is to define a new extension method on IEnumerable

So you have:

list = list.Distinct(foo => foo.id).ToList();

And ...

    public static IEnumerable<T> Distinct<T,TKey>(this IEnumerable<T> list, Func<T,TKey> lookup) where TKey : struct {
        return list.Distinct(new StructEqualityComparer<T, TKey>(lookup));
    }


    class StructEqualityComparer<T,TKey> : IEqualityComparer<T> where TKey : struct {

        Func<T, TKey> lookup;

        public StructEqualityComparer(Func<T, TKey> lookup) {
            this.lookup = lookup;
        }

        public bool Equals(T x, T y) {
            return lookup(x).Equals(lookup(y));
        }

        public int GetHashCode(T obj) {
            return lookup(obj).GetHashCode();
        }
    }

A similar helper class can be built to compare objects. (It will need to do better null handling)

Sam Saffron
+2  A: 

Using the Distinct() method is about 4x faster than using GroupBy() in my informal tests. For 1 million Foo's my test has Distinct() at about 0.89 seconds to make a unique array out of a non-unique array where GroupBy() takes about 3.4 seconds.

My Distinct() call looks like,

var unique = list.Distinct(FooComparer.Instance).ToArray();

and FooComparer looks like,

class FooComparer : IEqualityComparer<Foo> {
    public static readonly FooComparer Instance = new FooComparer();

    public bool Equals(Foo x, Foo y) {
        return x.id.Equals(y.id);
    }

    public int GetHashCode(Foo obj) {
        return obj.id.GetHashCode();
    }
}

and my GroupBy() version looks like,

var unique = (from l in list group l by l.id into g select g.First()).ToArray();
chuckj
+1 i was expecting group by to be slower, good to see the numbers
Sam Saffron
A: 

It's a lot of work to avoid using a Dictionary. :)

JP Alioto
How is list = list.Distinct(foo => foo.id).ToList(); a lot of work compared to the 6 line dictionary solution ....
Sam Saffron
And a static method and a class all to avoid using the correct data structure.
JP Alioto
The thing is that LINQ gives you a Distinct operator, it's just that they have not provided some really handy overloads, this is a way to work around that issue.
Sam Saffron