ansaurus

Question

Should I be concerned about .NET dictionary speed?

Answer 1

+1 A:

Have a look at C# HybridDictionary Usage

This class is recommended for cases where the number of elements in a dictionary is unknown. It takes advantage of the improved performance of a ListDictionary with small collections, and offers the flexibility of switching to a Hashtable which handles larger collections better than ListDictionary

astander 2009-12-14 20:20:51

Non-generic equals non-starter, for me. I do wonder why they never made a generic version of this class, though.

Joel Mueller 2009-12-14 22:26:29

Answer 2

+4 A:

I would do a benchmark of the Dictionary, HashTable (HashSet in .NET), and perhaps a home grown class, and see which works out best under your typical usage conditions.

Normally I would say its fine (insert StackOverflow's favorite premature ejaculation quote here), but if this is a core peice of the application, Benchmark, Benchmark, Benchmark.

Neil N 2009-12-14 20:21:27

A Dictionary<TKey, TValue> should always outperform a HashTable. .NET HashTables were a bad idea even when Dictionary<TKey, TValue> didn't exist :)

Cory Charlton 2009-12-14 20:26:05

Should? And I meant HashSet<T> I never liked the old HashTables in .Net

Neil N 2009-12-14 20:37:08

Answer 3

+41 A:

C# Doesn't have any dictionaries. It's a programming language. You mean ".NET Dictionaries".
You are micro-optimizing. Do you even have working code yet? You have no idea where the bottlenecks will be, and already you're focused on Dictionary.
How do you know how the Dictionary class is implemented? Maybe it already uses an array with hashed keys!

John Saunders 2009-12-14 20:21:43

+1 for calling out premature optimization... you can't optimize code that hasn't been written

Dave Swersky 2009-12-14 20:24:53

Another +1 for premature optimization. I agree so much I wrote a new comment in addition to upvoting the answer and clicking the up arrow on Dave's comment.

Daniel Schaffer 2009-12-14 20:28:03

I was going to add a comment to this effect but I knew someone else would get to it first.

Gord 2009-12-14 20:28:22

"If it doesn't work, it doesn't matter how fast it doesn't work." http://stackoverflow.com/questions/58640/great-programming-quotes/1649033#1649033

Kyralessa 2009-12-14 20:32:46

It's not "premature optimization" if it's something that will influence the design of the application. You don't want to write an application then halfway through realize that your design is wrong and that you have to rewrite it. People are so quick to regurgitate out-of-context quotes...

Josh Davis 2009-12-15 00:38:40

No, it's premature optimization because the code doesn't work yet! Make it work, make the automated tests work, then if you find you need to change the design because the prformance is inadequate, then you at least have all the automated tests written to prove your faster code _works_ faster. You also don't know ahead of time that this will affect the design.

John Saunders 2009-12-15 00:59:22

There is video, http://www.youtube.com/watch?v=aAb7hSCtvGw, I saw once with Joshua Bloch talking about how in some it does pay to think about performance before it becomes a problem (I think its about halfway though). From the, Great programming quotes CW - "Weeks of coding can save you hours of planning."

Courtney de Lautour 2009-12-15 02:11:43

@Downvoters should say what the problem is when downvoting. Nobody will notice that you've downvoted otherwise.

John Saunders 2010-07-06 23:13:52

Answer 4

+5 A:

The Dictionary<TKey, TValue> class is actually implemented as a hash table which makes lookups very fast (close to O(1)). See the API documentation for more information. I doubt you could make a better implementation yourself.

Anders Fjeldstad 2009-12-14 20:25:53

Not to be Mr Fuzzy Pants here, but O(1) doesn't mean it's fast, it just means that it's constant time. But that constant time might be very long or very short. That said, O(1) tends to be fast in practice.

JulianR 2009-12-14 21:52:46

Answer 5

A:

You may want to look at the KeyedCollection class in System.ObjectModel. From the MSDN description, "provides the abstract base class for a collection whose keys are embedded in the values."

antonm 2009-12-14 20:26:22

Guess what type `KeyedCollection` is based on to provide amortized O(1) retrieval by key?..

Pavel Minaev 2009-12-14 20:38:26

Answer 6

+6 A:

Wait and see if the performance of your application is below expectations
If it is then use a profiler to determine if the Dictionary lookup is the source of the problem
If it is then do some tests with representative data to see if another choice of list would be quicker.

In short - no, in general you shouldn't worry about the performance of implementation details until after you have a problem.

Kragen 2009-12-14 20:26:42

Answer 7

+2 A:

If your application is multithreaded then the key part of performance is going to be synchronizing this Dictionary correctly.

If it is single-threaded then almost certainly bottleneck will be elsewhere. Such as reading these objects from wherever you are reading them.

yu_sha 2009-12-14 20:27:11

Answer 8

+3 A:

The only concern that I can think of is that the speed of the dictionary relies on the key class having a reasonably fast GetHashCode method. Lookups and inserts are really fast, so you shouldn't have any problem there.

Regarding using an array, that's what the Dictionary class does already. Actually it uses two arrays, one for the keys and one for the values.

If you would have any performance problems with a Dictionary, it would be quite easy to make a wrapper for any kind of storage, that has the same methods and behaviour as a Dictionary so that you can replace it seamlessly.

Guffa 2009-12-14 20:28:57

Answer 9

+21 A:

Hello, I will be creating a project that will use dictionary lookups and inserts quite a bit. Is this something to be concerned about?

Yes. It is always wise to consider performance factors up front.

The form that your concern should take is as follows: your concern should be encouraging you to write realistic, user-focused performance specifications. It should be encouraging you to start writing performance tests early, and running them often, so that you can see how every single change to the product affects performance. That way you will be informed immediately when a code change causes a user-affecting change in performance. And it should be encouraging you to run profiles often, so that you are reasoning about performance based on empirical measurements, rather than random guesses and hunches.

Also, if I do benchmarking and such and it is really bad, then what is the best way of replacing dictionary with something else?

The best way to do this is to build a reasonable abstraction layer. If you have a class (or interface) which represents the "insert" and "lookup" abstract data type, then you can replace its internals without changing any of the callers.

Note that adding a layer of abstraction itself has a performance cost. If your profiling shows that the abstraction layer is too expensive, if the extra couple nanoseconds per call is too much, then you might have to get rid of the abstraction layer. Again, this decision will be driven by real-world performance data.

Would using an array with "hashed" keys even be faster? That wouldn't help on insert time though will it?

Neither you nor anyone reading this can possibly know which one is faster until you write it both ways and then benchmark it both ways under real-world conditions. Doing it under "lab" conditions will skew your results; you'll need to understand how things work when the GC is under realistic memory pressure, and so on. You might as well ask us which horse will run faster in next year's Kentucky Derby. If we knew the answer just by looking at the racing form, we'd all be rich already. You can't possibly expect anyone to know which of two entirely hypothetical, unwritten pieces of code will be faster under unspecified conditions!

Eric Lippert 2009-12-14 22:13:06

I agree with, and have used in practice, this approach of using an interface/abstraction up front and then removing it as performance testing indicates. And I'd add not to remove that interface too soon! It can cost you design flexibility down the road which can lead to a lot of refactoring....I've been there too.

Paul 2009-12-15 00:30:27

Answer 10

+1 A:

You may consider using the C5 library. I've found it to be very fast and thoughtfully designed. Others on stackoverflow have found the same. With C5 you have the option of using general type interfaces (with a captial I), or directly the data structures underneath. Naturally the interfaces allow you to swap out different implementations, but I have found in performance testing that the interfaces will cost you.

Paul

Paul 2009-12-15 00:22:19

Answer 11

+1 A:

I'm not sure that anyone has really answered this part yet:

Also, if I do benchmarking and such and it is really bad, then what is the best way of replacing dictionary with something else?

For this, wherever possible, declare your variables as IDictionary<TKey, TValue>. That's the main interface that Dictionary derives from. (I'm assuming that if you care that much about performance, then you aren't considering non-generic collections.) Then, in the future, you can change the underlying implementation class without having to change any of the code that uses that dictionary. For example:

IDictionary<string, int> myDict = new Dictionary<string, int>();

Morbo 2009-12-15 00:56:16

@Morbo: Eric Lippert said, "The best way to do this is to build a reasonable abstraction layer." in his answer.

John Saunders 2010-02-24 01:34:09

ansaurus

tags:

views:

answers:

Should I be concerned about .NET dictionary speed?

related questions