views:

180

answers:

4

I'm using a Dictionary<> to store a bazillion items. Is it safe to assume that as long as the server's memory has enough space to accommodate these bazillion items that I'll get near O(1) retrieval of items from it? What should I know about using a generic Dictionary as huge cache when performance is important?

EDIT: I shouldn't rely on the default implementations? What makes for a good hashing function?

+1  A: 

Yes, you will have near O(1) no matter how many objects you put into the Dictionary. But for the Dictionary to be fast, your key-objects should provide a sufficient GetHashCode-implementation, because Dictionary uses a hashtable inside.

Maximilian Mayerl
+12  A: 

It depends, just about entirely, on how good a hashing functionality your "bazillion items" support -- if their hashing function is not excellent (so that many conflicts result) your performance will degrade with the growth of the dictionary.

Alex Martelli
+3  A: 

Yes you will have O(1) access times. In fact to be pedantic g it will be exactly O(1). You need to ensure that all your objects that are used as keys have a good GetHashCode implementation and should likely override Equals.

Edit to clarify: In reality acess times will get slower the more items you have unless you can provide a "perfect" hash function.

Foxfire
+5  A: 

You should measure it and find out. You're the one who has knowledge of the exact usage of your dictionary, so you're the one who can measure it to see if it meets your needs.

A word of advice: I have in the past done performance analysis on large dictionary structures, and discovered that performance did degrade as the dictionary became extremely large. But it seemed to degrade here and there, not consistently on each operation. I did a lot of work trying to analyze the hash algorithms, etc, before smacking myself in the forehead. The garbage collector was getting slower because I had so much live working set; the dictionary was just as fast as it always was, but if a collection happened to be triggered, then that was eating up my cycles.

That's why it is important to not do performance testing in unrealistic benchmark scenarios; to find out what the real-world performance cost of your bazillion-item dictionary is, well, that's going to be gated on lots of stuff that has nothing to do with your dictionary, like how much collection triggering is happening throughout the rest of your program, and when.

Eric Lippert
Thanks Eric. What do you mean by collection triggering? You mean accessing the collection?
Joan Venge
I mean an action which triggers the garbage collector to start collecting.
Eric Lippert
Thanks Eric..........
Joan Venge