ansaurus

Question

What is the most performant way to check for existence with a collection of integers?

Answer 1

+15 A:

Use a HashSet<T>:

The HashSet class provides high performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order

HashSet<T> even exposes a constructor that accepts an IEnumerable<T>. By passing your List<T> to the HashSet<T>'s constructor you will end up with a reference to a new HashSet<T> that will contain a distinct sequence of items from your original List<T>.

Andrew Hare 2009-08-21 20:30:11

When inputList.Count != hashSet.Count, "Houston, we have duplicates!"

sixlettervariables 2009-08-21 20:34:43

Which is still O(n), the best I think he can get.

Marc 2009-08-21 20:35:10

@sixlettervariables - Excellent point!

Andrew Hare 2009-08-21 20:35:21

@Andrew: he could add the items one by one to a HashSet and return an exception immediately upon hashSet.ContainsKey(item) == true. Would save going all the way through if there was a duplicate.

sixlettervariables 2009-08-22 13:46:33

@sixlettervariables - Very true, at that point he would no longer need a `HashSet<T>` as any implementation of `IList<T>` has the `Contains` method.

Andrew Hare 2009-08-22 13:57:06

Answer 2

+1 A:

Sounds like a job for a Hashset...

Dan Diplo 2009-08-21 20:30:14

Answer 3

A:

If you are using framework 3.5 you can use the HashSet collection.

Otherwise the best option is the Dictionary. The value of each item will be wasted, but that will give you the best performance.

If you check for duplicates while you add the items to the HashSet/Dictionary instead of counting them afterwards, you get better performance than O(n) in case there are duplicates, as you don't have to continue looking after finding the first duplicate.

Guffa 2009-08-21 20:32:41

Answer 4

A:

Dictionary would be sufficient from a purely runtime complexity perspective since ContainsKey is a O(1) operation, but as other have point out the Hashset is better since you do not need to store key-value pairs.

Brian Gideon 2009-08-21 20:42:12

Answer 5

A:

If the set of numbers is sparse, then as others suggest use a HashSet.

But if the set of numbers is mostly in sequence with occasional gaps, it would be a lot better if you stored the number set as a sorted array or binary tree of begin,end pairs. Then you could search to find the pair with the largest begin value that was smaller than your search key and compare with that pair's end value to see if it exists in the set.

Zan Lynx 2009-08-21 21:40:52

Answer 6

A:

What about doing:

list.Distinct().Count() != list.Count()

I wonder about the performance of this. I think it would be as good as O(n) but with less code and still easily readable.

Stephen B. Burris Jr. 2009-08-22 16:24:14

ansaurus

tags:

views:

answers:

What is the most performant way to check for existence with a collection of integers?

related questions