views:

123

answers:

4

We have a v.large Dictionary<long,uint> (several million entries) as part of a high performance C# application. When the application closes we serialise the dictionary to disk using BinaryFormatter and MemoryStream.ToArray(). The serialisation returns in about 30 seconds and produces a file about 200MB in size. When we then try to deserialise the dictionary using the following code:

BinaryFormatter bin = new BinaryFormatter();
Stream stream = File.Open("filePathName", FileMode.Open);
Dictionary<long, uint> allPreviousResults =
    (Dictionary<long, uint>)bin.Deserialize(stream);
stream.Close();

It takes about 15 minutes to return. We have tried alternatives and the slow part is definitely bin.Derserialize(stream), i.e. the bytes are read from the hard drive (high performance SSD) in under 1 second.

Can someone please point out what we are doing wrong as we want the load time on the same order as the save time.

Regards, Marc

+2  A: 

You may want to use a profiler to see if, behind the scenes, the deserializer is performing a bunch of on-the-fly reflection.

For now, if you don't want to use a database, try storing your objects as a flatfile in a custom format. For example, the first line the file gives the total number of entries in the dictionary, allowing you to instantiate a dictionary with a predetermined size. Have the remaining lines as a series of fixed-width key-value pairs representing all of the entries in your dictionary.

With your new file format, use a StreamReader to read in your file line-by-line or in fixed blocks, see if this allows you read in your dictionary any faster.

Juliet
@Juliet Good point on sizing the dictionary before adding the entries. When investigating this approach, I would suggest using a binaryreader\writer as reading millions of strings, creating millions of strings and then parsing millions of longs and ulongs out of those strings will have performance issues of their own.
chibacity
See @Darin's example.
chibacity
+11  A: 

You may checkout protobuf-net or simply serialize it yourself which will probably be the fastest you can get.

class Program
{
    public static void Main()
    {
        var dico = new Dictionary<long, uint>();
        for (long i = 0; i < 7500000; i++)
        {
            dico.Add(i, (uint)i);
        }

        using (var stream = File.OpenWrite("data.dat"))
        using (var writer = new BinaryWriter(stream))
        {
            foreach (var key in dico.Keys)
            {
                writer.Write(key);
                writer.Write(dico[key]);
            }
        }

        dico.Clear();
        using (var stream = File.OpenRead("data.dat"))
        using (var reader = new BinaryReader(stream))
        {
            while (stream.Position < stream.Length)
            {
                var key = reader.ReadInt64();
                var value = reader.ReadUInt32();
                dico.Add(key, value);
            }
        }
    }
}

size of resulting file => 90M bytes (85.8MB).

Darin Dimitrov
Just ran this code using a dictionary with 20M key-value pairs, producing a file 234MB in size. Performance on an i7(4GHz) - 8GB DDR3 Ram - Vertex 2 SSD Hard drive: Dictionary build and write to file time - 2.17secs Dictionary read from file and rebuild time - 15.39secs If we can maintain that sort of performance it should work very well.
MarcF
+1: wonderful solution :)
Juliet
Just finished implementing this solution in our actual app and the results were similar to the performance times posted by me previously (i.e. Excellent). I was a little worried having non consecutive keys might cause a problem but it was unjustified (does not seem to make a difference). Again many thanks!!
MarcF
+1  A: 

There are several fast Key-Value NoSQL solutions out there why not try them? As a example ESENT, somebody posted it here at SO. managedesent

bassfriend
+3  A: 

Just to show similar serialization (to the accepted answer) via protobuf-net:

using System.Collections.Generic;
using ProtoBuf;
using System.IO;

[ProtoContract]
class Test
{
    [ProtoMember(1)]
    public Dictionary<long, uint> Data {get;set;}
}

class Program
{
    public static void Main()
    {
        Serializer.PrepareSerializer<Test>();
        var dico = new Dictionary<long, uint>();
        for (long i = 0; i < 7500000; i++)
        {
            dico.Add(i, (uint)i);
        }
        var data = new Test { Data = dico };
        using (var stream = File.OpenWrite("data.dat"))
        {
            Serializer.Serialize(stream, data);
        }
        dico.Clear();
        using (var stream = File.OpenRead("data.dat"))
        {
            Serializer.Merge<Test>(stream, data);
        }
    }
}

Size: 83meg - but most importantly, you haven't had to do it all by hand, introducing bugs. Fast too (will be even faster in "v2").

Marc Gravell