I have been using BinaryFormatter to serialise data to disk but it doesn't seem very scalable. I've created a 200Mb data file but am unable to read it back in (End of Stream encountered before parsing was completed). It tries for about 30 minutes to deserialise and then gives up. This is on a fairly decent quad-cpu box with 8Gb RAM.
I'm serialising a fairly large complicated structure.
htCacheItems is a Hashtable of CacheItems. Each CacheItem has several simple members (strings + ints etc) and also contains a Hashtable and a custom implementation of a linked list. The sub-hashtable points to CacheItemValue structures which is currently a simple DTO which contains a key and a value. The linked list items are also equally simple.
The data file that fails contains about 400,000 CacheItemValues.
Smaller datasets work well (though takes longer than i'd expect to deserialize and use a hell of a lot of memory).
public virtual bool Save(String sBinaryFile)
{
bool bSuccess = false;
FileStream fs = new FileStream(sBinaryFile, FileMode.Create);
try
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, htCacheItems);
bSuccess = true;
}
catch (Exception e)
{
bSuccess = false;
}
finally
{
fs.Close();
}
return bSuccess;
}
public virtual bool Load(String sBinaryFile)
{
bool bSuccess = false;
FileStream fs = null;
GZipStream gzfs = null;
try
{
fs = new FileStream(sBinaryFile, FileMode.OpenOrCreate);
if (sBinaryFile.EndsWith("gz"))
{
gzfs = new GZipStream(fs, CompressionMode.Decompress);
}
//add the event handler
ResolveEventHandler resolveEventHandler = new ResolveEventHandler(AssemblyResolveEventHandler);
AppDomain.CurrentDomain.AssemblyResolve += resolveEventHandler;
BinaryFormatter formatter = new BinaryFormatter();
htCacheItems = (Hashtable)formatter.Deserialize(gzfs != null ? (Stream)gzfs : (Stream)fs);
//remove the event handler
AppDomain.CurrentDomain.AssemblyResolve -= resolveEventHandler;
bSuccess = true;
}
catch (Exception e)
{
Logger.Write(new ExceptionLogEntry("Failed to populate cache from file " + sBinaryFile + ". Message is " + e.Message));
bSuccess = false;
}
finally
{
if (fs != null)
{
fs.Close();
}
if (gzfs != null)
{
gzfs.Close();
}
}
return bSuccess;
}
The resolveEventHandler is just a work around because i'm serialising the data in one application and loading it in another (http://social.msdn.microsoft.com/Forums/en-US/netfxbcl/thread/e5f0c371-b900-41d8-9a5b-1052739f2521)
The question is, how can I improve this? Is data serialisation always going to be inefficient, am i better off writing my own routines?