views:

410

answers:

6

Hi,

I will use a Dictionary in a .NET project to store a large number of objects. Therefore I decided to use a GUID-string as a key, to ensure unique keys for each object.

Does a large key such as a GUID (or even larger ones) decrease the performance of a Dictionary, e.g. for retrieving an object via its key?

Thanks, Andrej

+3  A: 

Yes and no. Larger strings increase the memory size of a dictionary. And larger sizes mean slightly longer times to calculate hash sizes.

But worrying about those things is probably premature optimization. While it will be slower, it's not anything that you will probably actually notice.

Anthony Mills
+1  A: 

Apparently it does. Here is a good test: Dictionary String Key Test

Emjay
Not a really good test as it shows a performance gain of only 3 seconds for 100 million dictionary lookups when comparing 2 and 20 character keys. This is not a significant gain in most uses of a dictionary.
Daniel Dyson
+3  A: 

The actual size of an object with respect to retrieving values is irrelevant. The speed of lookup of values is much more dependent on the speed of two methods on the passed in IEqualityComparer<T> instance

  • GetHashcode()
  • Equals()

EDIT

A lot of people are using String as a justification for saying that larger object size decreases lookup performance. This must be taken with a grain of salt for several reasons.

  • The performance of the above said methods for String decrease in performance as the size of the string increases for the default comparer. Just because it's true for System.String does not mean it is true in general
  • You could just as easily write a different IEqualityComparer<String> in such a way that string length was irrelevant.
JaredPar
You still have to loop over the string to compare it in the first place, so size will hinder performance regardless of your algorithm.
Blindy
@Blindy no you don't. It all depends on what I think are equal strings in my comparer. Perhaps I only care about the first 3 letter prefix? Perhaps only the first letter. Or I just just say that non-null strings are equal. None of these will vary in size as the string gets bigger
JaredPar
This doesn't seem to apply to the OP then, he needs full comparisons on guid-like keys.
Blindy
@Blindy I read the OP's question as more general. Besides which people who say a String key performance has a direct performance relation to it's length are simply wrong. It's an issue of the IEqualityComparer<TKey>, not the key itself.
JaredPar
@Jared: They aren't "simply wrong". In the default case--and, arguably, MOST cases--there *is* a direct correlation. Even the most common substitute, `StringComparer.*IgnoreCase` is still going to go character by character (until it doesn't find a match, obviously).
Adam Robinson
@Adam, I'm not trying to be argumentative but it is wrong to say that a String key size has a direct performance impact. It may be the most likely and default case but it's an incorrect assertion. You simply cannot look at the Key type of a Dictionary and make a solid statement about the performance characteristics of lookup. Only knowing the details of the IEqualityComparer<TKey> can give you this information.
JaredPar
@Jared: Yes, obviously you're correct. You could implement an `IEqualityComparer<T>` (for `string` or anything else) that is O(1). That is, however, *highly unlikely*. The distinction that you're making is academic, and I would imagine causes more confusion than it adds clarity.
Adam Robinson
@Adam, I disagree that it's an academic discussion specifically because i've seen exactly these types of `IEqualityComparer<T>` implementations (more than 1). And in at least one of the cases it was a good idea and fit the scenario. Just because it's unlikely doesn't mean you won't encounter it.
JaredPar
@Jared: Right, as I said, it's unlikely. If I meant you won't encounter it then I'd have said it won't happen at all ;) We'll agree to disagree, then.
Adam Robinson
@Adam, agreed. The more code I see, the more things I no longer take for granted :(
JaredPar
+7  A: 

I would recommend using an actual Guid rather than the string representation of the Guid. Yes, when comparing strings the length does affect the number of operations required, since it has to compare the strings character-by-character (at a bare minimum; this is barring any special options like IgnoreCase). The actual Guid will give you only 16 bytes to compare rather than the minimum of 32 in the string.

That being said, you are very likely not going to notice any difference...premature optimization and all that. I would simply go for the Guid key since that's what the data is.

Adam Robinson
Indeed, hashing a Guid will be considerably faster than hashing a 32 character string. +1
spender
Thanks, that sounds reasonable.
Andrej
A: 

I did a quick Google search and found this article.

http://dotnetperls.com/dictionary-string-key

It confirms that generally shorter keys perform better than longer ones.

ddc0660
This is not really a good test as it shows a performance gain of only 3 seconds for 100 million dictionary lookups when comparing 2 and 20 character keys. This is not a significant gain in most uses of a dictionary. In every real world usage of Dictionaries that I can find, the performance gains will be in nano seconds. So your statement "that generally shorter keys perform better than longer ones" is misleading. "Generally", it is insignificant.
Daniel Dyson
+1  A: 

see http://stackoverflow.com/questions/713109/performance-using-guid-object-or-guid-string-as-key for a similar question. You could test it out with an alternative key.

Michael Gattuso