views:

56

answers:

2

I have two collections a and b. I would like to compute the set of items in either a or b, but not in both (a logical exclusive or). With LINQ, I can come up with this:

IEnumerable<T> Delta<T>(IEnumerable<T> a, IEnumerable<T> b)
{
    return a.Except (b).Union (b.Except (a));
}

I wonder if there are other more efficient or more compact ways of producing the difference between the two collections.

Edit 1: Jon Skeet posted a first solution which does not preserve the order of the items by relying on a HashSet. I wonder if there are other approaches which would preserve the order of a and b in the output.

+4  A: 

Use HashSet<T> directly - it has a SymmetricExceptWith method:

HashSet<T> data = new HashSet<T>(a);
data.SymmetricExceptWith(b);

EDIT: If you want to maintain the order, here's an alternative:

HashSet<T> data = new HashSet<T>(a);
data.IntersectWith(b);
foreach (T t in a.Concat(b))
{
    if (!data.Contains(t))
    {
        yield return t;
    }
}

This has the following important differences:

  • Both a and b are iterated over twice. In some cases that could be a very bad thing - you could call ToList on each of them to start with to retain a buffer.
  • If there are duplicates in either a or b, they will be yielded multiple times. If you wanted to avoid this you could keep a set of already-yielded values. At this point, it would be equivalent to:

    a.Concat(b).Except(a.Intersect(b))
    

That's still only two set operations instead of the three in your original code though.

Jon Skeet
Thanks Jon for your quick reply. HashSet work fine as long as you are not interested in the original order of the items. What if I want to keep the order of the items in `a` and `b` in the difference?
Pierre
@Pierre: I've edited my answer with another couple of options.
Jon Skeet
@Jon: Thanks a lot for your time. In my case, `a` and `b` do not contain duplicates, so this is not a concern. The LINQ expression you propose is much more readable (and therefore maintainable) than the piece of code involving the `HashSet`. I like it!
Pierre
+1  A: 

Given a.Except(b) and b.Except(a) are disjoint, you can use concat instead of union, saving a set operator (and concat is more efficient).

return a.Except (b).Concat (b.Except (a));

This still runs through each list twice.

Cameron MacFarland
@Cameron: Thank you; you are right, `Concat` will be faster than `Union` since my inputs are disjoint; I had overlooked that point.
Pierre