tags:

views:

636

answers:

7

What is the fastest way to find out whether two ICollection<T> collections contain precisely the same entries? Brute force is clear, I was wondering if there is a more elegant method?

We are using C# 2.0, so no extension methods if possible, please!

Edit: the answer would be interesting both for ordered and unordered collections, and would hopefully be different for each...

A: 

Brute force takes O(n) - comparing all elements (assuming they are sorted), which I would think is the best you could do - unless there is some property of the data that makes it easier.

I guess for the case of not sorted, its O(n*n).

In which case, I would think a solution based around a merge sort would probably help.

For example, could you re-model it so that there was only one collection? Or 3 collections, one for those in collection A only, one for B only and for in both - so if the A only and B only are empty - then they are the same... I am probably going off on totally the wrong tangent here...

Chris Kimpton
+1  A: 

You mean the same entries or the same entries in the same order?

Anyway, assuming you want to compare if they contain the same entries in the same order, "brute force" is really your only option in C# 2.0. I know what you mean by non elegant, but if the atomic comparision itself is O(1), the whole process should be in O(N), which is not that bad.

DrJokepu
+1  A: 

If the entries need to be in the same order (besides being the same), then I suggest - as an optimization - that you iterate both collections at the same time and compare the current entry in each collection. Otherwise, the brute force is the way to go.

Oh, and another suggestion - you could override Equals for the collection class and implement the equality stuff in there (depends on you project, though).

Dan C.
+2  A: 

First compare the .Count of the collections if they have the same count the do a brute force compare on all elements. Worst case scenarios is O(n). This is in the case the order of elements needs to be the same.

The second case where the order is not the same, you need to use a dictionary to store the count of elements found in the collections: Here's a possible algorithm

  • Compare collection Count : return false if they are different
  • Iterate the first collection
    • If item doesn't exist in dictionary then add and entry with Key = Item, Value = 1 (the count)
    • If item exists increment the count for the item int the dictionary;
  • Iterate the second collection
    • If item is not in the dictionary the then return false
    • If item is in the dictionary decrement count for the item
      • If count == 0 the remove item;
  • return Dictionary.Count == 0;
Pop Catalin
+4  A: 
Ok- I guess ContainsCount uses the hash for lookup and so the lookups are O(1) - so overall this is O(n) - although if "this" contains a superset of items, it will return true...
Chris Kimpton
+1  A: 

For ordered collections, you can use the SequenceEqual() extension method defined by System.Linq.Enumerable:

if (firstCollection.SequenceEqual(secondCollection))

HTH, Kent

Kent Boogaart
+1  A: 

Again, using the C5 library, having two sets, you could use:

C5.ICollection<T> set1 = C5.ICollection<T> ();
C5.ICollection<T> set2 = C5.ICollecton<T> ();
if (set1.UnsequencedEquals (set2)) {
  // Do something
}

The C5 library includes a heuristic that actually tests the unsequenced hash codes of the two sets first (see C5.ICollection<T>.GetUnsequencedHashCode()) so that if the hash codes of the two sets are unequal, it doesn't need to iterate over every item to test for equality.

Also something of note to you is that C5.ICollection<T> inherits from System.Collections.Generic.ICollection<T>, so you can use C5 implementations while still using the .NET interfaces (though you have access to less functionality through .NET's stingy interfaces).

Marcus Griep