tags:

views:

1829

answers:

7

IMPORTANT : THIS IS NOT A LINQ-TO-SQL QUESTION. This is LINQ to objects.

Short question:

Is there a simple way in LINQ to objects to get a distinct list of objects from a list based on a key property on the objects.

Long question:

I am trying to do a Distinct() operation on a list of objects that have a key as one of their properties.

class GalleryImage {
   public int Key { get;set; }
   public string Caption { get;set; }
   public string Filename { get; set; }
   public string[] Tags {g et; set; }
}

I have a list of Gallery objects that contain GalleryImage[].

Because of the way the webservice works [sic] I have duplicates of the GalleryImage object. i thought it would be a simple matter to use Distinct() to get a distinct list.

This is the LINQ query I want to use :

var allImages = Galleries.SelectMany(x => x.Images);
var distinctImages = allImages.Distinct<GalleryImage>(new 
                     EqualityComparer<GalleryImage>((a, b) => a.id == b.id));

The problem is that EqualityComparer is an abstract class.

I dont want to :

  • implement IEquatable on GalleryImage because it is generated
  • have to write a separate class to implement IEqualityComparer as shown here

Is there a concrete implementation of EqualityComparer somewhere that I'm missing?

I would have thought there would be an easy way to get 'distinct' objects from a set based on a key.

+2  A: 

You could group by the key value and then select the top item from each group. Would that work for you?

Charlie Flowers
yes i'm just looking at that actually - via the ToLookup(). maybe inefficient and slow but ok for this task. posting my statement above/below
Simon_Weaver
+2  A: 

This is the best i can come up with for the problem in hand. Still curious whether theres a nice way to create a EqualityComparer on the fly though.

Galleries.SelectMany(x => x.Images).ToLookup(x => x.id).Select(x => x.First());

Create lookup table and take 'top' from each one

Note: this is the same as @charlie suggested but using ILookup - which i think is what a group must be anyway.

Simon_Weaver
I agree that it feels like the framework is lacking something. I don't know if it is IEqualityComparer though ... it really needs both methods. It feels like there should be an easier way of using Distinct: an override that takes a predicate.
Charlie Flowers
Not a predicate. I mean an override of Distinct that would take T and let you select the object that you want to use for distinctiveness.
Charlie Flowers
@charlie - right, thats what i actually thought i WAS going to get with the existing Distinct(..). i'd just never used it in this context before, and of course it turned out not to be what i expected
Simon_Weaver
+2  A: 

What about a throw away IEqualityComparer generic class?

public class ThrowAwayEqualityComparer<T> : IEqualityComparer<T>
{
  Func<T, T, bool> comparer;

  public ThrowAwayEqualityComparer<T>(Func<T, T, bool> comparer)
  {
    this.comparer = comparer;
  }

  public bool Equals(T a, T b)
  {
    return comparer(a, b);
  }

  public int GetHashCode(T a)
  {
    return a.GetHashCode();
  }
}

So now you can use Distinct.

var distinctImages = allImages.Distinct(
   new ThrowAwayEqualityComparer<GalleryImage>((a, b) => a.Key == b.Key));

You might be able to get away with the <GalleryImage>, but I'm not sure if the compiler could infer the type (don't have access to it right now.)

And in an additional extension method:

public static class IEnumerableExtensions
{
  public static IEnumerable<TValue> Distinct<TValue>(this IEnumerable<TValue> @this, Func<TValue, TValue, bool> comparer)
  {
    return @this.Distinct(new ThrowAwayEqualityComparer<TValue>(comparer);
  }

  private class ThrowAwayEqualityComparer...
}
Samuel
Pretty good. Then you could also implement the override of Distinct that I wished for.
Charlie Flowers
Yes, you could easily do that and get what you wanted.
Samuel
But aren't you still implementing IEqualityComparer<T>. It sounded like you didn't want to do that.
Abhijeet Patel
Note that this won't necessarily work; there's no guarantee that the GetHashCode implementation you've supplied will be consistent with the Equals method. This could then give wrong results.
kvb
@abhijeet - sure this is still implementing IEqualityComparer, but this is meant for 'generic' use. it could be hidden away in a utility class and switched out for the framework version if microsoft ever added one.
Simon_Weaver
@kvb: If T's GetHashCode() doesn't match it's Equals() method, then you're fucked. Because you couldn't do anything about it.
Samuel
Sure, that's true. But T's GetHashCode also may not match the Func<T,T,bool> method passed in to the constructor, in which case your class won't work properly. This will often be the case in practice (e.g. the class uses the default hash code implementation and you extract a key for comparison).
kvb
I'd say since he is using IEnumerable, you don't really need to concern yourself with GetHashCode(). It's really only used for a hash table.
Samuel
It's perfectly reasonable for methods such as IEnumerable.Distinct to use the GetHashCode() function to bin items before doing a presumably more expensive equality check. Your implementation does not fulfill IEqualityComparer's contract.
kvb
+1  A: 

Building on Charlie Flowers' answer, you can create your own extension method to do what you want which internally uses grouping:

    public static IEnumerable<T> Distinct<T, U>(
        this IEnumerable<T> seq, Func<T, U> getKey)
    {
        return
            from item in seq
            group item by getKey(item) into gp
            select gp.First();
    }

You could also create a generic class deriving from EqualityComparer, but it sounds like you'd like to avoid this:

    public class KeyEqualityComparer<T,U> : IEqualityComparer<T>
    {
        private Func<T,U> GetKey { get; set; }

        public KeyEqualityComparer(Func<T,U> getKey) {
            GetKey = getKey;
        }

        public bool Equals(T x, T y)
        {
            return GetKey(x).Equals(GetKey(y));
        }

        public int GetHashCode(T obj)
        {
            return GetKey(obj).GetHashCode();
        }
    }
kvb
+1  A: 

Here's an interesting article that extends LINQ for this purpose... http://www.singingeels.com/Articles/Extending_LINQ__Specifying_a_Property_in_the_Distinct_Function.aspx

The default Distinct compares objects based on their hashcode - to easily make your objects work with Distinct, you could override the GetHashcode method.. but you mentioned that you are retrieving your objects from a web service, so you may not be able to do that in this case.

markt
+10  A: 

(There are two solutions here - see the end for the second one):

My MiscUtil library has a ProjectionEqualityComparer class (and two supporting classes to make use of type inference).

Here's an example of using it:

EqualityComparer<GalleryImage> comparer = 
    ProjectionEqualityComparer<GalleryImage>.Create(x => x.id);

Here's the code (comments removed)

// Helper class for construction
public static class ProjectionEqualityComparer
{
    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TSource, TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }

    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TSource, TKey> (TSource ignored,
                               Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }
}

public static class ProjectionEqualityComparer<TSource>
{
    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }
}

public class ProjectionEqualityComparer<TSource, TKey>
    : IEqualityComparer<TSource>
{
    readonly Func<TSource, TKey> projection;
    readonly IEqualityComparer<TKey> comparer;

    public ProjectionEqualityComparer(Func<TSource, TKey> projection)
        : this(projection, null)
    {
    }

    public ProjectionEqualityComparer(
        Func<TSource, TKey> projection,
        IEqualityComparer<TKey> comparer)
    {
        projection.ThrowIfNull("projection");
        this.comparer = comparer ?? EqualityComparer<TKey>.Default;
        this.projection = projection;
    }

    public bool Equals(TSource x, TSource y)
    {
        if (x == null && y == null)
        {
            return true;
        }
        if (x == null || y == null)
        {
            return false;
        }
        return comparer.Equals(projection(x), projection(y));
    }

    public int GetHashCode(TSource obj)
    {
        if (obj == null)
        {
            throw new ArgumentNullException("obj");
        }
        return comparer.GetHashCode(projection(obj));
    }
}

Second solution

To do this just for Distinct, you can use the DistinctBy extension in MoreLINQ:

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector)
    {
        return source.DistinctBy(keySelector, null);
    }

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        source.ThrowIfNull("source");
        keySelector.ThrowIfNull("keySelector");
        return DistinctByImpl(source, keySelector, comparer);
    }

    private static IEnumerable<TSource> DistinctByImpl<TSource, TKey>
        (IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
        foreach (TSource element in source)
        {
            if (knownKeys.Add(keySelector(element)))
            {
                yield return element;
            }
        }
    }
Jon Skeet
Excellent info! Thanks!
Charlie Flowers
A: 

implement IEquatable on GalleryImage because it is generated

A different approach would be to generate GalleryImage as a partial class, and then have another file with the inheritance and IEquatable, Equals, GetHash implementation.

Richard