ansaurus

Question

Help with C#.NET generic collections performance and optimization

Answer 1

+8 A:

I think you need to break this down into 2 questions

Is `Dictionary<myType,string>` the best available type for this scenario

No. Based on your breakdown, HashSet<myType> is clearly the better choice because it's usage pattern more accurately fits the scenario

Will switching to `Hashset<myType>` give me a performance boost?

This is really subjective and only a profiler can give you the answer to this question. Likely you'll see a very minor memory size improvement per element in the collection. But in terms of raw computing power I doubt you'll see a huge difference. Only a profiler can tell you if there is one.

Before you ever make a performance related change to your code remember the golden rule.

Don't make any performance related changes until a profiler has told you precisely what is wrong with your code.

Making changes which violate this rule are just guesses. A profiler is the only way to measure success of a performance fix.

JaredPar 2009-10-05 22:35:03

+1 but the last sentence should be the big bold one :)

Rex M 2009-10-05 22:37:04

@Rex, I made it a bit bigger :)

JaredPar 2009-10-05 22:39:06

+1 Agree. Last sentence is the most important part of the post

zebrabox 2009-10-05 22:39:15

If this is .NET 2.0, then `Dictionary<TKey, TValue>` is probably the best choice because `HashSet<T>` was added only in .NET 3.5.

0xA3 2009-10-05 22:52:22

+1 for profiling before optimizing!

TrueWill 2009-10-06 00:43:08

Answer 2

+2 A:

1) No. A dictionary does a hash on the key so your lookup should be O(1). A Hashset should result in less memory needed though. But honestly, it isn't that much that you will really see a performance boost.

2) Give us some more detail as to what you are trying to accomplish. The code you posted is pretty simple. Have you measured yet? Are you seeing that this method is slow? Don't forget "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth

Bob 2009-10-05 22:37:20

Thanks - I am aware of premature optimization being bad. This is existing code I am trying to optimize, it's looping through dozens of thousands items and for each one of them is calling a ContainsKey. There's gotta be a bettere way of doing it! :)

JohnIdol 2009-10-06 08:07:34

Answer 3

+2 A:

Depending on the size of your keys, you may actually see performance degrade.
One way in 2.0 would be to try and insert it and catch the exception (of course, this depends on how many duplicate keys you plan on having:

foreach(string key in keysToAdd)
{
  try
  {
    dictionary.Add(key, "myvalue");
  }
  catch(ArgumentException) 
  {
    // do something about extra key
  }
}

scottm 2009-10-05 22:37:49

@divo, it depends on how many keys you are adding and if you plan on having duplicates. The ContainsKey() method iterates every item in the dictionary each time, so as the dictionary grows, the "simple if" can be a big draw. If you don't plan on having duplicates, you aren't using exceptions for control flow because you aren't expecting an exception to be thrown. It's can be cheaper than the if in that context.

scottm 2009-10-05 23:02:07

I am expecting dupes and I need to filter them out - catching the exceptions, even if it's wrong to use it for flow control, sounds better than calling ContainsKey every single time (most of the items won't be dupes), but will it boost performance?

JohnIdol 2009-10-06 08:11:18

You should follow JaredPar's advice on that question. If this portion of code is already noticeably affecting performance, you may try the change and see if you gain anything.

scottm 2009-10-06 14:24:15

that was always the plan :)

JohnIdol 2009-10-07 08:23:16

Answer 4

+1 A:

The obvious mistake (if we discuss performance) I can see is the double work done when calling ContainsKey and then adding the key-value pair. When the pair is added using Add method, the key is again internally checked for presense. The whole if block can be safely replaced by this:

... myDictionary[currentKey] = ""; ...

If the key already exists there, the value will be just replaces and no exception will get thrown. Moreover, if the value is not used at all I would personally use null values to fill it. Can see no reason for using any string constant there.

Alexander 2009-10-22 18:48:22

Answer 5

A:

The possible performance degrade mentioned by scottm is not for doing simple lookups. It is for calculating the intersection between 2 sets. HashSet does have slightly faster lookups than Dictionary. The performance difference really is going to be very small, though, as everyone says -- the lookup takes most of the time & creating the KeyValuePair takes very little.

For 2.0, you could make the "Value" object one of these:

public struct Empty {}

It may do slightly better than the "".

Or you could try making a reference to System.Core.dll in your 2.0 project, so you can use the HashSet.

Also, make sure that GetHashCode and Equals are as efficient as possible for MyType. I've been bitten by using a dictionary on something with a really slow GetHashCode (I believe we tried to use a delegate as a key or something like that.)

2010-09-03 19:44:04

ansaurus

tags:

views:

answers:

Help with C#.NET generic collections performance and optimization

Is `Dictionary<myType,string>` the best available type for this scenario

Will switching to `Hashset<myType>` give me a performance boost?

related questions

ansaurus

tags:

views:

answers:

Help with C#.NET generic collections performance and optimization

Is Dictionary<myType,string> the best available type for this scenario

Will switching to Hashset<myType> give me a performance boost?

related questions

Is `Dictionary<myType,string>` the best available type for this scenario

Will switching to `Hashset<myType>` give me a performance boost?