ansaurus

Question

Is HashSet<T> the fastest container to look up in?

Answer 1

+7 A:

Yes, HashSet is perfect for this since it contains one value to look up unlike a Dictionary which requires a key and a value.

Ahmad Mageed 2010-02-13 20:50:51

Answer 2

+14 A:

Does HashSet work? Sure. But that's not the question you asked. You asked for the fastest possible lookup.

Is it the fastest possible? No, of course not, not by any measure.

First off, in order to talk about "fastest" we need to describe precisely what "fastest" means. Do you mean:

smallest worst possible case timing
smallest average timing averaged over many timings
smallest average timing given a particular usage pattern
something else

? Please clarify precisely what "fastest possible" means. We can devise you an algorithm which is the in theory fastest possible only if we know precisely what fastest possible means to you.

For example, suppose you are writing a compiler. Something we have to do all the time in compilers is check whether a particular string is in a list of strings. Perhaps we are checking to see if a string is a keyword, so we have to look up whether a given string is inside the set {"int", "double", "for", "foreach", "class" ... }

We could put those in a hash set and get decent performance. But if we wanted the best possible performance we could do a lot better. We could, for example, do an analysis of a few billion lines of existing source code to find out which keywords were most common and which were least common, and then write a custom hash table that optimized for (1) rapidly rejecting things that were not keywords at all, and (2) rapidly recognizing the most common keywords at the expense of recognizing other keywords.

Note that this requires static analysis; though it performs well on typical cases, it performs poorly on those rare cases where there are lots of rare keywords used. Another approach we could take would be to write a self tuning hash table that dynamically identified when particular strings were being searched for frequently.

Consider, for example, if you are writing an implementation of the JScript runtime. We frequently must look for a string in a set of strings:

for(i = 0; i < 10; ++i) { foo.bar(i); }

Here we must look up the string "bar" inside the object identified by "foo" ten times. The hash table inside "foo" that implements that lookup notices the first time through the loop that "bar" has been used, so it dynamically tweaks the hash table structure so that the second time through the loop, the lookup is faster. This is the strategy we employed in our implementation of JScript.

Now, that optimizes the case for loops, but it makes this case potentially slower than it could be:

for(i = 0; i < 10; ++i) { foo.bar(i); foo.blah(i); foo.abc(i); }

because we don't do more analysis and realize "hey, we just re-optimized this hash table three times, and now we're going to do it all again, maybe we should just leave it as it is."

Fortunately for us, we were not, like you, looking for the fastest possible lookup. We were only looking for a reasonably fast lookup.

Can you carefully and completely describe what exactly your usage case is for the fastest possible lookup? There are lots of algorithms you can use to speed up lookups, but they get very complicated.

Eric Lippert 2010-02-14 18:04:19

Eric,thank you so much for such advanced answer!My usage case is very simple i think. Pages in my asp.net application has some asp.net 2.0 control (such is DetailsView or GridView). A superclass of this pages creates a dictionary where control's data fields are the keys and appropriate localized strings are the values. Superclass calls overridden property of HashSet contains set of fields required by specific page and dynamically creates a radio button list.This is a search panel.So while iterating dictionary I have to ask page does it's set contains selected field to insert it into the table.

abatishchev 2010-02-15 00:43:14

@abatishchev: do you have any evidence that the performance of your application is gated by this lookup? That is, is this lookup the *slowest thing in your application*? If this is not the gating factor then why do you care whether it is as fast as possible? Find the slowest component and improve *its* performance.

Eric Lippert 2010-02-15 00:53:05

Yea, of course I agree with you in your development tactics advice! Just I need to say that my development first of all is education so this is just an example how I try to get know more about.. for example generic containers.

abatishchev 2010-02-15 01:08:57

ansaurus

tags:

views:

answers:

Is HashSet<T> the fastest container to look up in?

related questions