tags:

views:

424

answers:

5

I need to determine whether or not two sets contains exactly the same elements. The ordering does not matter.

For instance, these two arrays should be considered equal:

IEnumerable<int> data = new []{ 3,5,6,9 };
IEnumerable<int> otherData = new []{ 6,5,9,3}

One set cannot contain any elements, that are not in the other.

Can this be done using the built-in query operators ? And what would be the most efficient way to implement it, considering that the number of elements could range from a few to hundreds ?

A: 

Hi, This should help:

    IEnumerable<int> data = new []{ 3,5,6,9 };
    IEnumerable<int> otherData = new[] {6, 5, 9, 3};

    if(data.All(x => otherData.Contains(x)))
    {
        //Code Goes Here
    }
Blounty
It's O(n²) in complexity. Dangerous if you have more than several dozens items in your list.
Yann Schwartz
Simple, but this will not perform well enough for my scenario.
driis
A: 

If you might have duplicates (or if you want a solution which performs better for longer lists), I'd try something like this:

static bool IsSame<T>(IEnumerable<T> set1, IEnumerable<T> set2)
{
    if (set1 == null && set2 == null)
        return true;
    if (set1 == null || set2 == null)
        return false;

    List<T> list1 = set1.ToList();
    List<T> list2 = set2.ToList();

    if (list1.Count != list2.Count)
        return false;

    list1.Sort();
    list2.Sort();

    return list1.SequenceEqual(list2);
}

UPDATE: oops, you guys are right-- the Except() solution below needs to look both ways before crossing the street. And it has lousy perf for longer lists. Ignore the suggestion below! :-)

Here's one easy way to do it. Note that this assumes the lists have no duplicates.

bool same = data.Except (otherData).Count() == 0;
Justin Grant
You could use .Any() rather than Count() - then it won't enumerate every item in the list.
Matt Breckon
What if `data = {1,2}, otherData = {1,2,3}`? You should also check the other way around.
Kobi
This won't work in my scenario, without checking both ways as suggested by Kobi. And with a few hundred elements, I would be worried about performance for that approach.
driis
A: 
  1. First, check the length. If they are different, the sets are different.
  2. you can do data.Intersect(otherData);, and check the length is identical.
  3. OR, simplt sort the sets, and iterate through them.
Kobi
+3  A: 

I suggest sorting both, and doing an element-by-element comparison.

data.OrderBy(x => x).SequenceEqual(otherData.OrderBy(x => x))

I'm not sure how fast the implementation of OrderBy is, but if it's a O(n log n) sort like you'd expect the total algorithm is O(n log n) as well.

For some cases of data, you can improve on this by using a custom implementation of OrderBy that for example uses a counting sort, for O(n+k), with k the size of the range wherein the values lie.

Joren
It's misleading to say counting sort is `O(n)`. It's `O(n+M)` where M is the length of the counting array.
Mehrdad Afshari
Thanks, edited.
Joren
+13  A: 

If you want to treat the arrays as "sets" and ignore duplicate items, you can use HashSet<T>.SetEquals method:

var isEqual = new HashSet<int>(first).SetEquals(second);

Otherwise, your best bet is probably sorting both sequences in the same way and using SequenceEqual to compare them.

Mehrdad Afshari
I think HashSet<T>.SetEquals was the method I was looking for :-)
driis
Good answer-- I forgot about SetEquals! If you may have dupes, before sorting you should probably copy the sequences into a List and compare the lengths first-- this saves you the (expensive) sorting or SequenceEqual() operations in case the lengths are different.
Justin Grant
@Justin Grant - I don't follow... You need to remove duplicates before you compare lengths, and this is just as expensive as sorting.
Kobi