ansaurus

Question

LINQ Select Distinct while ignoring the XML field

Answer 1

A:

This extension method should return a list of items with only the first item from each set of duplicates in it...

public static IEnumerable<Tsource> RemoveDuplicates<Tkey, Tsource>(this IEnumerable<Tsource> source, Func<Tsource, Tkey> keySelector)
{
    var hashset = new HashSet<Tkey>();
    foreach (var item in source)
    {
        var key = keySelector(item);
        if (hashset.Add(key))
            yield return item;
    }
}

it would be used on a list like this list.RemoveDuplicates(x => x.UserID). If there were two records in List with the same userID, it would only return the first

TerrorAustralis 2010-10-14 02:39:25

Answer 2

+3 A:

Write an IEqualityComparer<T> implementation for the object which contains your XML type and pass it to Distinct. In the Equals method you can implement equality semantics however you wish.

This is a handy T4 code-generation template I wrote myself for generating IEqualityComparer<T> implementations for my team's domain models:

<#@ template language="C#v3.5" debug="True" #>
<#@ output extension=".generated.cs" #>
<#
    var modelNames = new string[] {
        "ClassName1",
        "ClassName2",
        "ClassName3",
    };

    var namespaceName = "MyNamespace";
#>
using System;
using System.Collections.Generic;

namespace <#= namespaceName #>
{
<#
    for (int i = 0; i < modelNames.Length; ++i)
    {
        string modelName = modelNames[i];
        string eqcmpClassName = modelName + "ByIDEqualityComparer";
#>
    #region <#= eqcmpClassName #>

    /// <summary>
    /// Use this EqualityComparer class to determine uniqueness among <#= modelName #> instances
    /// by using only checking the ID property.
    /// </summary>
    [System.Diagnostics.DebuggerNonUserCode]
    public sealed partial class <#= eqcmpClassName #> : IEqualityComparer<<#= modelName #>>
    {
        public bool Equals(<#= modelName #> x, <#= modelName #> y)
        {
            if ((x == null) && (y == null)) return true;
            if ((x == null) || (y == null)) return false;

            return x.ID.Equals(y.ID);
        }

        public int GetHashCode(<#= modelName #> obj)
        {
            if (obj == null) return 0;

            return obj.ID.GetHashCode();
        }
    }

    #endregion
<#
        if (i < modelNames.Length - 1) WriteLine(String.Empty);
    } // for (int i = 0; i < modelNames.Length; ++i)
#>
}

It makes the assumption that each of your model classes have a property named "ID" which is the primary key, stored as something that implements Equals. Our convention forces all of our models to have this property. If your models all have differently-named ID properties, consider modifying either this T4 template to suit your needs or better yet, make life easier on yourself (not just for the sake of using this T4) and change your models to use the "ID" name.

James Dunne 2010-10-14 02:51:47

The T4 template sure is handy, but to use the IEqualityComparer I have to 1st serailize both my queries to arrays (as linq2entities doesn't support the comparer) before I can remove duplicates. Nevertheless, its something that works for the time being, thanx!

TJB 2010-10-14 21:25:27

@TJB: Ah. I don't have any experience with LINQ-to-entities. I just use the `IEqualityComparer<T>` for LINQ-to-objects on in-memory collections for a quick distinct-by-ID. Glad to be of service! T4 templates rule :)

James Dunne 2010-10-15 00:22:00

Answer 3

+2 A:

as James Dunne said, you'd want to use an IEqualityComparer

a quick mock up would be something like this. You'll need to replace "ObjectType" with whatever type is in your subQuery1 and subQuery2 of course. please note this is untested:

List<ObjectType> listQueries = new List<ObjectType>();

ObjectTypeEqualityComparer objectTypeComparer = new ObjectTypeEqualityComparer();

listQueries.AddRange(subQuery1);// your first query
listQueries.AddRange(subQuery2); // your second query
ObjectType[] result = listQueries.Distinct(objectTypeComparer).ToArray();


class ObjectTypeEqualityComparer : IEqualityComparer<ObjectType>
{
    public bool Equals(ObjectType obj1, ObjectType obj2)
    {
        return obj1.UserId == obj2.UserId ?  true : false;
    }

    public int GetHashCode(ObjectType obj)
    {
        return obj.UserId.GetHashCode();
    }

}

jb 2010-10-14 04:06:37

Piggyback answer, eh? :) I was considering going back and updating my answer to include an example, but yours suffices just as well.

James Dunne 2010-10-14 14:23:32

haha, in my defense, i was gonna write this out before i saw your answer :)

jb 2010-10-16 02:35:15

Answer 4

+1 A:

You could use morelinq's DistinctBy. I suspect (but have not verified) that this, as well as the IEqualityComparer and RemoveDuplicates answers, will retrieve the duplicate records from SQL Server and then remove the duplicates on the client. If someone provides a server-side solution, I'd recommend accepting their answer.

TrueWill 2010-10-14 04:25:11

Right, I'm looking for something that can achieve the 'distinct' in sql rather than in post-processing

TJB 2010-10-14 05:16:29

ansaurus

tags:

views:

answers:

LINQ Select Distinct while ignoring the XML field

related questions