views:

310

answers:

1

I am currently using a Hashtable to store a list of unique identifiers and associated data, all of which is read in from a file.

The length of this data file can very greatly, from 1 entry to several hundred thousand. I've noticed a significant slowdown in the speed of adding entries to the Hashtable once it gets past about 50,000 entries.

I think setting the initial capacity might help, but obviously I can't know this number since the data is read from a file. Can anyone suggest a way to speed up adding a lot of entries, or is this behavior pretty normal?

edit: Right now I am just using a Hashtable. I think it should probably be Dictionary<string, MyDataObject>, but that seems like a separate issue.

+1  A: 

See here for comparison of HashTables and dictionaries for large numbers of items.

Preet Sangha
I didn't think the difference would be so drastic - it looks like switching to a Dictionary would go a long way towards solving my problem.However, I cannot test right now, but I suspect I would see the same kind of slowdown on a smaller scale with a Dictionary.
jnylen
The comparison is nevertheless interesting, because it tests with 10,000,000 keys and a GUI as id. It takes ~6sec. So there should be no bottleneck for 50,000 entries ... That's why I think it could be the file rather than the insert ...
tanascius
This benchmark is not very good because new GUIDs are generate inside the timed loop and GUID generation is slow compared to a hash table access. In a quick test I found that creating a new GUID takes about 6 times as long as an insert into Dictionary<Int32, Guid>.
Daniel Brückner
OK, that is a problem - but not in this context. Inserting 10,000,000 entries in 6sec. with creating an additional GUID during time measurement is fast and should not result in a bottleneck.
tanascius