views:

116

answers:

8

I have about 3000 different files I need to organize, and retrieve at different times during the game.

I created my own struct of variables. I was thinking about creating a "Dictionary " at the beginning of my application, and simply loading all my files before the game starts.

I'm wondering about performance: will a dictionary with this many entries cause my application to be slow? Would a large dictionary make "TryGetValue" and "ContainsKey" run slower?

thanks for the advice!

+8  A: 

3000 entries is piddling for a Dictionary<>. That will not be a source of slowdown.

Reading 3000 different files into memory at startup, on the other hand, will be slow. You'll be much better off reading files into memory only at the time they're needed, but keeping them in memory afterwards for subsequent accesses.

JSBangs
Since he's mentioning a game, that normal rule may not apply. Depending upon at what point they'd load and how big they are, it may be better to load them up behind the startup splash screen rather than while Ashe is trying to smite a Deadite...
AllenG
@AllenG, good point.
JSBangs
Arguably, they could spawn a background thread during startup that performs the process.
Steven Sudit
A: 

Dictionaries in .NET use a hash table lookup scheme, so adding entries has very little, if any, effect on lookup performance. The only issue you will have might be memory usage. A dictionary of 3000 items will consume roughly 3000 times the storage used by the key plus the value types. If it's just a simple struct without huge binary blobs, 3000 is downright tiny.

recursive
+3  A: 

No it won't. It will consume memory but TryGetValue and ContainKey should be pretty fast as a dictionary is a hashtable and access to the elements by the key is constant and it won't depend on the number of elements.

Darin Dimitrov
+1 - Good answer
JonH
+3  A: 

Providing the hashcode algorithm for the dictionary key type spreads the hashcodes relatively evenly across the Int32 space, hashcode lookup is unaffected by dictionary size.

See http://en.wikipedia.org/wiki/Hashtable#Performance_analysis for more details.

Paul Ruane
+1 for pointing out that this only works if the hash isn't broken.
Steven Sudit
A: 

Your bottleneck will not be the dictionary's performance but rather the reading of 3000 files.

AndyC
A: 

As with most things with computers (and particular with performance), "It Depends (tm)"

It all depends on the implemention of Dictionary.

It could be done as a binary tree, in which case look up should be O(log2 N), which means lookup time grows slowly as the size of the dictionary grows.

It could be done as a hash table, which, in theory is O(1), which mean that a lookup will always take the same amount of time regardless of the size of the dictionary, but that's the theory, and depend on the number of buckets and quality of the hash code. It many item end up in the same bucket, requiring a linear seach, thing will slow down considerably as the dictionary grows.

However, the dictionary would have to grow beyond 3000 by several order of magnitude before you see a noticeable difference.

James Curran
Dictionary<> is spec'd as using a hash table.
Jon Hanna
A: 

There is a limit, but 3000 is nowhere near it. Dictionary<> uses Object.GetHashCode() to oraginise its keys, which returns an int. Therefore you can store a maximum of 2^32 (4,294,967,296) keys before there is a collision. However, due to the way .Net's hash codes are usually calculated there will likely be many collisions as you get close to this magic number.

Adding more keys will not slow down TryGetValue and ContainsKey - they are O(1) operations.

Callum Rogers
Your last two sentences conflict. If two keys collide, looking up one will take longer than a key with a unique hash code.
James Curran
Nothing to do with the way .NET's hash codes are usually calculated, and everything to do with basic math. 1st, if you have more than 2^32 possible values (true for most types) then it's impossible to guarantee uniqueness unless you know the values in advance and can create a perfect hash. Also, the dictionary won't start out by taking up 16GB worth of pointer space so it can have 2^32 slots (could be more if it isn't ref types stored), but will reduce it, so more collisions.Despite this though, with well spread out bits, collisions will generally be rare. A few collisions wont' hurt much.
Jon Hanna
+2  A: 

TryGetValue and ContainsKey should be pretty fast at that size, as long as the key has well distributed hashes.

A Dictionary has an indexable number of "buckets". When it adds or looks for a value by a key it will take the value returned by GetHashCode(), hash it down again to be less than the number of buckets (generally something simple like modulo, but the implementation isn't defined), and look in the relevant bucket.

The bucket will currently have zero or more items. The dictionary will compare each item with the key using .Equals().

The first bit of finding the right bucket is going to be in constant time O(1). The second bit of comparing the key with the keys in the bucket is going to be in lineary time O(n) where n relates only to the number of items in that bucket, not in the whole collection.

Generally there should be very few items in each bucket (the number of buckets will grow to try to keep this the case) so the operation is essentially constant time.

If however your hash codes are poorly implemented, there will be lots of keys in the same bucket. The time complexity will get closer and closer to O(n), as can be seen by experimenting with an object with a deliberately bad GetHashCode that just returns 0 every time. In its worse case it is worse than a List, since a List is also O(n), but Dictionary has more overhead.

Does any of this mean you should worry? No, even relatively naïve hashing methods should give relatively good results. If you're using a string key, then it's probably already going to be more than good enough. If you're using a simple built-in type, then even more so.

If you do find that accessing the dictionary is slow though, then you want to pay attention to this and either fix the GetHashCode() method or create an IEqualityComparer (which lets you define outside rules for GetHashCode() and Equals() for use with dictionaries, hashsets, etc).

Most likely though, 3000 is nothing, it'll be fine.

Jon Hanna