tags:

views:

2757

answers:

2

Here is an interesting issue I noticed when using the Except Operator: I have list of users from which I want to exclude some users:

The list of users is coming from an XML file:

The code goes like this:

interface IUser { int ID { get; set; } string Name { get; set; } }

class User: IUser
{

    #region IUser Members

    public int ID
    {
        get;
        set;
    }

    public string Name
    {
        get;
        set;
    }

    #endregion

    public override string ToString()
    {
        return ID + ":" +Name;
    }


    public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users)
    {
         IEnumerable<IUser> localList = new List<User>
         {
            new User{ ID=4, Name="James"},
            new User{ ID=5, Name="Tom"}

         }.OfType<IUser>();
         var matches = from u in users
                       join lu in localList
                           on u.ID equals lu.ID
                       select u;
         return matches;
    }
}

class Program
{
    static void Main(string[] args)
    {
        XDocument doc = XDocument.Load("Users.xml");
        IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
            (u => new User
                { ID = (int)u.Attribute("id"),
                  Name = (string)u.Attribute("name")
                }
            ).OfType<IUser>();       //still a query, objects have not been materialized


        var matches = User.GetMatchingUsers(users);
        var excludes = users.Except(matches);    // excludes should contain 6 users but here it contains 8 users

    }
}

When I call User.GetMatchingUsers(users) I get 2 matches as expected. The issue is that when I call users.Except(matches) The matching users are not being excluded at all! I am expecting 6 users ut "excludes" contains all 8 users instead.

Since all I'm doing in GetMatchingUsers(IEnumerable users) is taking the IEnumerable and just returning the IUsers whose ID's match( 2 IUsers in this case), my understanding is that by default "Except" will use reference equality for comparing the objects to be excluded. Is this not how "Except" behaves?

What is even more interesting is that if I materialize the objects using .ToList() and then get the matching users, and call "Except", everything works as expected!

Like so:

IEnumerable users = doc.Element("Users").Elements("User").Select (u => new User { ID = (int)u.Attribute("id"), Name = (string)u.Attribute("name") } ).OfType().ToList(); //explicity materializing all objects by calling ToList()

var matches = User.GetMatchingUsers(users); var excludes = users.Except(matches); // excludes now contains 6 users as expected

I don't see why I should need to materialize objects for calling "Except" given that its defined on IEnumerable?

Any suggesstions / insights would be much appreciated.

+3  A: 

I think I know why this fails to work as expected. Because the initial user list is a LINQ expression, it is re-evaluated each time it is iterated (once when used in GetMatchingUsers and again when doing the Except operation) and so, new user objects are created. This would lead to different references and so no matches. Using ToList fixes this because it iterates the LINQ query once only and so the references are fixed.

I've been able to reproduce the problem you have and having investigated the code, this seems like a very plausible explanation. I haven't proved it yet, though.

Update
I just ran the test but outputting the users collection before the call to GetMatchingUsers, in that call, and after it. Each time the hash code for the object was output and they do indeed have different values each time indicating new objects, as I suspected.

Here is the output for each of the calls:

==> Start
ID=1, Name=Jeff, HashCode=39086322
ID=2, Name=Alastair, HashCode=36181605
ID=3, Name=Anthony, HashCode=28068188
ID=4, Name=James, HashCode=33163964
ID=5, Name=Tom, HashCode=14421545
ID=6, Name=David, HashCode=35567111
<== End
==> Start
ID=1, Name=Jeff, HashCode=65066874
ID=2, Name=Alastair, HashCode=34160229
ID=3, Name=Anthony, HashCode=63238509
ID=4, Name=James, HashCode=11679222
ID=5, Name=Tom, HashCode=35410979
ID=6, Name=David, HashCode=57416410
<== End
==> Start
ID=1, Name=Jeff, HashCode=61940669
ID=2, Name=Alastair, HashCode=15193904
ID=3, Name=Anthony, HashCode=6303833
ID=4, Name=James, HashCode=40452378
ID=5, Name=Tom, HashCode=36009496
ID=6, Name=David, HashCode=19634871
<== End

And, here is the modified code to show the problem:

using System.Xml.Linq;
using System.Collections.Generic;
using System.Linq;
using System;

interface IUser
{
    int ID
    {
        get;
        set;
    }
    string Name
    {
        get;
        set;
    }
}

class User : IUser
{

    #region IUser Members

    public int ID
    {
        get;
        set;
    }

    public string Name
    {
        get;
        set;
    }

    #endregion

    public override string ToString()
    {
        return ID + ":" + Name;
    }


    public static IEnumerable<IUser> GetMatchingUsers(IEnumerable<IUser> users)
    {
        IEnumerable<IUser> localList = new List<User>
         {
            new User{ ID=4, Name="James"},
            new User{ ID=5, Name="Tom"}

         }.OfType<IUser>();

        OutputUsers(users);
        var matches = from u in users
                      join lu in localList
                          on u.ID equals lu.ID
                      select u;
        return matches;
    }

    public static void OutputUsers(IEnumerable<IUser> users)
    {
        Console.WriteLine("==> Start");
        foreach (IUser user in users)
        {
            Console.WriteLine("ID=" + user.ID.ToString() + ", Name=" + user.Name + ", HashCode=" + user.GetHashCode().ToString());
        }
        Console.WriteLine("<== End");
    }
}

class Program
{
    static void Main(string[] args)
    {
        XDocument doc = new XDocument(
            new XElement(
                "Users",
                new XElement("User", new XAttribute("id", "1"), new XAttribute("name", "Jeff")),
                new XElement("User", new XAttribute("id", "2"), new XAttribute("name", "Alastair")),
                new XElement("User", new XAttribute("id", "3"), new XAttribute("name", "Anthony")),
                new XElement("User", new XAttribute("id", "4"), new XAttribute("name", "James")),
                new XElement("User", new XAttribute("id", "5"), new XAttribute("name", "Tom")),
                new XElement("User", new XAttribute("id", "6"), new XAttribute("name", "David"))));
        IEnumerable<IUser> users = doc.Element("Users").Elements("User").Select
            (u => new User
            {
                ID = (int)u.Attribute("id"),
                Name = (string)u.Attribute("name")
            }
            ).OfType<IUser>();       //still a query, objects have not been materialized


        User.OutputUsers(users);
        var matches = User.GetMatchingUsers(users);
        User.OutputUsers(users);
        var excludes = users.Except(matches);    // excludes should contain 6 users but here it contains 8 users

    }
}
Jeff Yates
If that is the case, then wouldn't the "new" objects be passed into GetMatchingUsers every single time ? Also that method returns a query as the result and not objects. Just my 2 cents...
Abhijeet Patel
No, because the expression is evaluated each time it is used. In my code, which shows this, it is evaluated by my output before the call to GetMatchingUsers, then again when calling GetMatchingUSers, and importantly, again during the Except.
Jeff Yates
Because the evaluation for GetMatchingUsers and the Except both generate their own instances, the Except fails to work as you expect.
Jeff Yates
It doesn't matter if GetMatchingUsers returns a query or not, the instances returned by the users enumerable are still different on EACH evaluation.
Jeff Yates
I see.. so it seems that for all set related operators(extension methods) the underlying objects would have to be materialized,unless an implementation of IEquatable<T> for the IUser interface has been passed as a parameter?
Abhijeet Patel
It looks that way, in this situation, yes.
Jeff Yates
You basically need to ensure that the evaluation of the original expression is only performed once so that the results aren't instantiated multiple times, or that the comparison doesn't rely on the object reference.
Jeff Yates
Makes perfect sense... This one was particularly tricky. Thanks for your help
Abhijeet Patel
No problem. It was very intriguing. :)
Jeff Yates
+1  A: 

I think you should implement IEquatable<T> to provide your own Equals and GetHashCode methods.

From MSDN (Enumerable.Except):

If you want to compare sequences of objects of some custom data type, you have to implement the IEqualityComparer<(Of <(T>)>) generic interface in your class. The following code example shows how to implement this interface in a custom data type and provide GetHashCode and Equals methods.

CMS
But the code he has should work. Why isn't it working?
Jeff Yates
CMS:I've implemented IEqualtable<T> in my production code and that DOES work. What I fail to understand is that why is it that explicitly calling ToList() on query BEFORE calling GetMatching Users produces the desired effect instead of leaving the users variable as query
Abhijeet Patel
Jeff: I'm not returning the IUsers from the local list I've created inside GetMatchingUser, The method returns IUsers from the original IEnumerable<IUser>, so the references should still be to the original IUser objects behind the scenes so reference equality should have worked as expected!
Abhijeet Patel
@Abhijeet: Yes, I saw that. Hence deleting my answer. I'm taking a closer look myself. I've reproduced what you are seeing.
Jeff Yates
Abhijeet Patel