ansaurus

Question

What .NET collection provides the fastest search

Answer 1

+12 A:

System.Collections.Generic.HashSet

Jimmy 2009-06-17 19:38:51

Note: Do not forget to override the hashcode function. For added performance, pregenerate your hashcode in your constructor.

Brian 2009-06-17 19:51:35

@Brian: good point. I was assuming (baselessly) Record.Key was a builtin type of some kind.

Jimmy 2009-06-17 19:59:40

Record.Key is just a long

2009-06-17 20:06:48

@Brian: instead of pregenerating I prefer to store the generated one the first time, why to slow down the constructor with something you don't know if it will be used?

jmservera 2009-06-17 20:48:07

@jmservera: True, but storing the generated hash seems more complicated to me (though only slightly), and often has no meaningful performance impact. And I suspect it makes the gethackcode function slightly more expensive, though this is also negligible.

Brian 2009-06-17 21:51:37

Given the number of searches required, for this *particular* situation, Clemahieu's answer seems to be more efficient. If he plans on doing more lookups in the future, a HashSet is a wise option, as it will perform individual searches very efficiently.

Matt Boehm 2009-06-18 17:10:58

My apologies, BubbaFat and Qberticus also mentioned this approach.

Matt Boehm 2009-06-18 17:17:25

@Matt Boehm: for this *particular* situation, it wasn't given that LargeCollection started out sorted.

Jimmy 2009-06-18 18:41:14

FYI: Performance test - I created a comparison between List<T> and HashSet<T> for strings. I found that HashSet was about 1000 times faster than List.

Quango 2010-09-05 19:29:28

Answer 2

+6 A:

If you don't need ordering, try HashSet<Record> (new to .Net 3.5)

If you do, use a List<Record> and use check BinarySearch.

SLaks 2009-06-17 19:40:04

Answer 3

+4 A:

Have you considered List.BinarySearch(item)?

You said that your large collection is already sorted so this seems like the perfect opportunity? A hash would definitely be the fastest, but this brings about its own problems and requires a lot more overhead for storage.

Mark 2009-06-17 19:44:08

You are right, a hash may bring some undesirable problems when using mutable objects as a key.

jmservera 2009-06-17 20:44:03

Answer 4

+1 A:

If you aren't worried about squeaking every single last bit of performance the suggestion to use a HashSet or binary search is solid. Your datasets just aren't large enough that this is going to be a problem 99% of the time.

But if this just one of thousands of times you are going to do this and performance is critical (and proven to be unacceptable using HashSet/binary search), you could certainly write your own algorithm that walked the sorted lists doing comparisons as you went. Each list would be walked at most once and in the pathological cases wouldn't be bad (once you went this route you'd probably find that the comparison, assuming it's a string or other non-integral value, would be the real expense and that optimizing that would be the next step).

Bubbafat 2009-06-17 19:48:22

Answer 5

+2 A:

If it's possible to sort your items then there is a much faster way to do this then doing key lookups into a hashtable or b-tree. Though if you're items aren't sortable you can't really put them into a b-tree anyway.

Anyway, if sortable sort both lists then it's just a matter of walking the lookup list in order.

Walk lookup list
   While items in check list <= lookup list item
     if check list item = lookup list item do something
   Move to next lookup list item

Qberticus 2009-06-17 19:51:28

Answer 6

+1 A:

If you're using .Net 3.5, you can make cleaner code using:

foreach (Record item in LookupCollection.Intersect(LargeCollection))
{
  //dostuff
}

I don't have .Net 3.5 here and so this is untested. It relies on an extension method. Not that LookupCollection.Intersect(LargeCollection) is probably not the same as LargeCollection.Intersect(LookupCollection) ... the latter is probably much slower.

This assumes LookupCollection is a HashSet

Brian 2009-06-17 19:57:38

Answer 7

+2 A:

Keep both lists x and y in sorted order.

If x = y, do your action, if x < y, advance x, if y < x, advance y until either list is empty.

The run time of this intersection is proportional to min (size (x), size (y))

Don't run a .Contains () loop, this is proportional to x * y which is much worse.

2009-06-17 20:10:50

+1 for the more efficient algorithm.Even if the lists are currently unsorted, it would be more efficient to first sort them and then run this algorithm.

Matt Boehm 2009-06-18 17:03:14

Wouldn't the runtime be proportional to max(size(x),size(y)) in the worst case scenario though?Example:int[] x = {99,100};int[] y = {0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};

Matt Boehm 2009-06-18 17:14:22

No because once you complete the smaller set, you can append the remaining elements from the larger set because they are already sorted. I think this process is similar to Merge Sort.

2009-06-19 13:24:18

ansaurus

tags:

views:

answers:

What .NET collection provides the fastest search

related questions