tags:

views:

70

answers:

3

I need a faster way to create an index file. The application generates pairs of items to be indexed. I currently add each pair as it is generated to a sorted dictionary and then write it out to a disk file. This works well until the number of items added exceeds one million, at which time it slows to the point that is unacceptable. There can be as many as three million data items to be indexed. I prefer to avoid a database because I do not want to significantly increase the size of the deployment package, which is now less than one-half of one megabyte. I tried Access but it is even slower than the sorted dictionary -if it had an efficient bulk load utility then that might work, but I do not find such a tool for Access.

Is there a better way to roll my own index?

+5  A: 

Is the SortedDictionary really the bottleneck? As compared to the I/O ?
You really should profile this first to prevent optimizing the wrong parts.

But as a tip, of you have 1M or more items it is a good idea to pre-allocate your Dictionary. Give it an initial capacity of 2M or so.

var index = new SortedDictionary(2 * 1024 * 1024);

If your Dictionary is the problem I would expect it to be from constant re-allocation sooner than from the actual index-searches.

Henk Holterman
I will try that. There is no IO until the SD is loaded, then the entire SD is written out.
bill seacham
SortedDictionary does not appear to have a constructor that allows for initial size allocation, as some of the other structures do. Is there some other way to pre-allocate its size?
bill seacham
@bill, you're right, SortedDictionary doesn't support that. SortedList does, and you could give it a try as a replacement. Even though at first face it has worse insertion behaviour.
Henk Holterman
+1  A: 

Just a thought, but could you use an in-memory SQL solution, like SQL Lite ? It's just a small DLL, but will help your priorities, do your logic in C# and do your sorting in SQL.

Have a look here:

http://www.mikeduncan.com/sqlite-on-dotnet-in-3-mins/

The Download for SQL Lite itself is only 253k, and the .net bindings are about 75k.

Russ C
sqlite.org seems to have taken the day off -no website. I Googled it but the link goes nowhere. So is it an opensource project?
bill seacham
Hi Bill, sorry for the late reply, the SQL Lite website appears to be back again, the SqlLite ADO.Net provider is on sourceforge.
Russ C
A: 

Is SQLite too big to deploy with your software? I'm going to agree with Henk that the constant reallocations within the SortedDictionary are probably the bottleneck. If that solution proves false, try using SQLite to see if that increases performance, and then you can decide on where to go from there.

Josh Smeaton
SqlLite + dot.net bindings are around 350kb unzipped. likely a lot smaller if you remove the parts you don't need for C#.
Russ C