views:

2510

answers:

5

Right, so I have an enumerable and wish to get distinct values from it.

Using System.Linq, there's of course an extension method called Distinct. In the simple case, it can be used with no parameters, like:

var distinctValues = myStringList.Distinct();

Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:

var distinctValues = myCustomerList.Distinct(someEqualityComparer);

The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.

What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:

var distinctValues
    = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);

Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?

Alternatively, is there a way of specifying an IEqualityComparer inline (embarass me)?

+17  A: 

It looks to me like you want DistinctBy from MoreLINQ. You can then write:

var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);

Here's a cut-down version of DistinctBy (no nullity checking and no option to specify your own key comparer):

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
     (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> knownKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (knownKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}
Jon Skeet
Thanks for the MoreLINQ tip. Nice.
Tor Haugen
+3  A: 

No there is no such extension method overload for this. I've found this frustrating myself in the past and as such I usually write a helper class to deal with this problem. The goal is to convert a Func<T,T,bool> to IEqualityComparer<T,T>.

Example

public class EqualityFactory {
  private sealed class Impl<T> : IEqualitComparer<T,T> {
    private Func<T,T,bool> m_del;
    private IEqualityComparer<T> m_comp;
    public Impl(Func<T,T,bool> del) { 
      m_del = del;
      m_comp = EqualityComparer<T>.Default;
    }
    public bool Equals(T left, T right) {
      return m_del(left, right);
    } 
    public int GetHashcode(T value) {
      return m_comp.GetHashcode(value);
    }
  }
  public static IEqualityComparer<T,T> Create<T>(Func<T,T,bool> del) {
    return new Impl<T>(del);
  }
}

This allows you to write the following

var distinctValues = myCustomerList
  .Distinct(EqualityFactory.Create((c1, c2) => c1.CustomerId == c2.CustomerId));
JaredPar
That has a nasty hash code implementation though. It's easier to create an `IEqualityComparer<T>` from a projection: http://stackoverflow.com/questions/188120/can-i-specify-my-explicit-type-comparator-inline
Jon Skeet
(Just to explain my comment about the hash code - it's very easy with this code to end up with Equals(x, y) == true, but GetHashCode(x) != GetHashCode(y). That basically breaks anything like a hashtable.)
Jon Skeet
I agree with the hash code objection. Still, +1 for the pattern.
Tor Haugen
@Jon, yeah I agree the original implementation of GetHashcode is less than optimal (was being lazy). I switched it to essentially use now EqualityComparer<T>.Default.GetHashcode() which is slightly more standard. Truthfully though, the only guaranteed to work GetHashcode implementation in this scenario is to simply return a constant value. Kills hashtable lookup but is guaranteed to be functionally correct.
JaredPar
@JaredPar: Exactly. The hash code has to be consistent with the equality function you're using, which presumably *isn't* the default one otherwise you wouldn't bother :) That's why I prefer to use a projection - you can get both equality and a sensible hash code that way. It also makes the calling code have less duplication. Admittedly it only works in cases where you want the same projection twice, but that's every case I've seen in practice :)
Jon Skeet
A: 

I'm assuming you have an IEnumerable, and in your example delegate, you would like c1 and c2 to be referring to two elements in this list?

I believe you could achieve this with a self join var distinctResults = from c1 in myList join c2 in myList on

MattH
+1  A: 

Something I have used which worked well for me.

/// <summary>
/// A class to wrap the IEqualityComparer interface into matching functions for simple implementation
/// </summary>
/// <typeparam name="T">The type of object to be compared</typeparam>
public class MyIEqualityComparer<T> : IEqualityComparer<T>
{
    /// <summary>
    /// Create a new comparer based on the given Equals and GetHashCode methods
    /// </summary>
    /// <param name="equals">The method to compute equals of two T instances</param>
    /// <param name="getHashCode">The method to compute a hashcode for a T instance</param>
    public MyIEqualityComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
    {
        if (equals == null)
            throw new ArgumentNullException("equals", "Equals parameter is required for all MyIEqualityComparer instances");
        EqualsMethod = equals;
        GetHashCodeMethod = getHashCode;
    }
    /// <summary>
    /// Gets the method used to compute equals
    /// </summary>
    public Func<T, T, bool> EqualsMethod { get; private set; }
    /// <summary>
    /// Gets the method used to compute a hash code
    /// </summary>
    public Func<T, int> GetHashCodeMethod { get; private set; }

    bool IEqualityComparer<T>.Equals(T x, T y)
    {
        return EqualsMethod(x, y);
    }

    int IEqualityComparer<T>.GetHashCode(T obj)
    {
        if (GetHashCodeMethod == null)
            return obj.GetHashCode();
        return GetHashCodeMethod(obj);
    }
}
Kleinux
A: 

This will do what you want but I don't know about performance:

var distinctValues =
    from cust in myCustomerList
    group cust by cust.CustomerId
    into gcust
    select gcust.First();

At least it's not verbose.

Gordon Freeman