views:

126

answers:

2

Hi There,

I am trying to export a stringdictionary to a text file, it has over one million of records, it takes over 3 minutes to export into a textfile if I use a loop.

Is there a way to do that faster?

Regards

+4  A: 

Well, it depends on what format you're using for the export, but in general, the biggest overhead for exporting large amounts of data is going to be I/O. You can reduce this by using a more compact data format, and by doing less manipulation of the data in memory (to avoid memory copies) if possible.

The first thing to check is to look at your disk I/O speed and do some profiling of the code that does the writing.

If you're maxing out your disk I/O (e.g., writing at a good percentage of disk speed, which would be many tens of megabytes per second on a modern system), you could consider compressing the data before you write it. This uses more CPU, but you write less to the disk when you do this. This will also likely increase the speed of reading the file, if you have the same bottleneck on the reading side.

If you're maxing out your CPU, you need to do less processing work on the data before writing it. If you're using a serialization library, for example, avoiding that and switching to a simpler, more specialized data format might help. Consider the simplest format you need: probably just a word for the length of the string, followed by the string data itself, repeated for every key and value.

Curt Sampson
"If you're using a serialization library, for example, avoiding that and switching to a simpler, more specialized data format might help." - or use a faster serialization library ;-p
Marc Gravell
+3  A: 

Note that most dictionary constructs don't preserve the insert order - this often makes them poor choices if you want repeatable file contents, but (depending on the size) we may be able to improve on the time.... this (below) takes about 3.5s (for the export) to write just under 30MB:

    StringDictionary data = new StringDictionary();
    Random rand = new Random(123456);
    for (int i = 0; i < 1000000; i++)
    {
        data.Add("Key " + i, "Value = " + rand.Next());
    }
    Stopwatch watch = Stopwatch.StartNew();
    using (TextWriter output = File.CreateText("foo.txt"))
    {
        foreach (DictionaryEntry pair in data)
        {
            output.Write((string)pair.Key);
            output.Write('\t');
            output.WriteLine((string)pair.Value);
        }
        output.Close();
    }
    watch.Stop();

Obviously the performance will depend on the size of the actual data getting written.

Marc Gravell