tags:

views:

58

answers:

2

I hate posting this since it's somewhat subjective, but it feels like there's a better method to do this that I'm just not thinking of.

Sometimes I want to 'distinct' a collection by certain columns/properties but without throwing away other columns (yes, this does lose information, as it becomes arbitrary which values of those other columns you'll end up with).

Note that this extension is less powerful than the Distinct overloads that take an IEqualityComparer<T> since such things could do much more complex comparison logic, but this is all I need for now :)

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> getKeyFunc)
{
    return from s in source
            group s by getKeyFunc(s) into sourceGroups
            select sourceGroups.First();
}

Example usage:

var items = new[]
{
    new { A = 1, B = "foo", C = Guid.NewGuid(), },
    new { A = 2, B = "foo", C = Guid.NewGuid(), },
    new { A = 1, B = "bar", C = Guid.NewGuid(), },
    new { A = 2, B = "bar", C = Guid.NewGuid(), },
};

var itemsByA = items.DistinctBy(item => item.A).ToList();
var itemsByB = items.DistinctBy(item => item.B).ToList();
+1  A: 

I've previously written a generic Func => IEqualityComparer utility class just for the purpose of being able to call overloads of LINQ methods that accept an IEqualityComparer with having to write a custom class each time.

It uses a delegate (just like your example) to supply the comparison semantics. This allows me to use the built-in implementations of the library methods rather than rolling my own - which I presume are more likely to be correct and efficiently implemented.

public static class ComparerExt
{
    private class GenericEqualityComparer<T> : IEqualityComparer<T>
    {
        private readonly Func<T, T, bool> m_CompareFunc;

        public GenericEqualityComparer( Func<T,T,bool> compareFunc ) {
            m_CompareFunc = compareFunc;
        }

        public bool Equals(T x, T y) {
            return m_CompareFunc(x, y);
        }

        public int GetHashCode(T obj) {
            return obj.GetHashCode(); // don't override hashing semantics
        }
    }

    public static IComparer<T> Compare<T>( Func<T,T,bool> compareFunc ) {
        return new GenericEqualityComparer<T>(compareFunc);
    }
}

You can use this as so:

var result = list.Distinct( ComparerExt.Compare( (a,b) => { /*whatever*/ } );

I also often throw in a Reverse() method to allow for changing the ordering of operands in the comparison, like so:

private class GenericComparer<T> : IComparer<T>
{
    private readonly Func<T, T, int> m_CompareFunc;
    public GenericComparer( Func<T,T,int> compareFunc ) {
        m_CompareFunc = compareFunc;
    }
    public int Compare(T x, T y) {
        return m_CompareFunc(x, y);
    }
}

public static IComparer<T> Reverse<T>( this IComparer<T> comparer )
{
    return new GenericComparer<T>((a, b) => comparer.Compare(b, a));
}
LBushkin
Distinct takes a IEqualityComparer, not a IComparer... otherwise, that's a good solution ;)
Thomas Levesque
@Thomas - You're right. I wasn't paying attention to which part of the ComparerExt class I was pasting (as you can imagine, it has a bunch of other useful methods that I was trying to exclude for the purpose of brevity). I've corrected my answer.
LBushkin
any chance you could post the entire CompareExt class somewhere? Sounds very useful :)
James Manning
btw, I marked Luke's answer because it was more targeted for the specific question, but I really like your GenericEqualityComparer !
James Manning
+2  A: 

Here you go. I don't think this is massively more efficient than your own version, but it should have a slight edge. It only requires a single pass through the sequence, yielding each item as it goes, rather than needing to group the entire sequence first.

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
    this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    return source.DistinctBy(keySelector, null);
}

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
    if (source == null)
        throw new ArgumentNullException("source");

    if (keySelector == null)
        throw new ArgumentNullException("keySelector");

    return source.DistinctByIterator(keySelector, keyComparer);
}

private static IEnumerable<TSource> DistinctByIterator<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
{
    var keys = new HashSet<TKey>(keyComparer);

    foreach (TSource item in source)
    {
        if (keys.Add(keySelector(item)))
            yield return item;
    }
}
LukeH
+1 That's how I'd do it ;)
Dan Tao