views:

422

answers:

6

I read a table with more than 1 million records from a database. It takes 3 minutes until I have a populated object in memory. I wanted to optimize this process, and serialized this object to a file using binary BinaryFormatter. It created a file with 1/2 GB size. After I deserialized this file back to memory object. This took 11 minutes!

The question: why it's much faster to read all these data from a database than from a file? Is it possible to optimize deserializing process somehow?

Database is on the same machine I did this test. None of other processes took CPU time at this time. CPU has 4 cores and there are 40 GB memory.

Edit: Code for deserializing:

    using (FileStream fs = new FileStream(filename, FileMode.Open))
    {
        var bf = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
        var data = (MyType)bf.Deserialize(fs);
        ...
    }
A: 

I've found serialization to be quite an overhead, and I'd expect deserializing this much data from a file to take a lot longer than querying it from a database. You're also reading the data from a file on the disk so that's going to be expensive as well. If you're trying to cache data, it's best to look at some in memory option.

Charlie
After deserializing the data will not be written to disk.
Kamarey
Sorry, was a typo in my answer. I meant you read the data from the disk, then deserialize it. You want to store it in a file on disk rather than a database and that's going to be more of an overhead. Basically, it sounds like you should stick to a database as your datastore. Otherwise you might want to cache portions of your data in memory, but not by serializing and writing out to a file.
Charlie
A: 

I have noticed that it takes quite a while for .NET to initialize XmlSerializers. So if you are not reusing the serializer objects, doing this should speed up the process significantly.

Skrim
This is binary serialization, not xml one. But may be this relates to binary serializer also...
Kamarey
+2  A: 

Because of the way the Binary Serializer works, it is painfully slow. It injects a lot of reflection-based metadata into the binary file. I ran some tests against some rather large structures a few years back and found that the XMLSerializer is both smaller and faster than the binary serializer. Go figure!

In either case, the serialization is done via reflection, which is slow. You might consider your own serialization mechanism.

I once created my own binary serialization mechanism (using file write/read), and it performed 20 times faster than the XML serializer, which performed faster than the binary serializer. It was also significantly smaller.

You might want to consider doing something like that.

Brian Genisio
A: 

I concur with Brian's post. If write your own persistence logic you can eliminate the overhead of reflection calls and have full control over how data is loaded from disk. You'll have to write more code, but in this case that could be the price of optimisation.

Andy Holt
A: 

Have you profiled your specific scenario?

Comments are assuming that issue is reflection, but I'm now working on a similar scenario (deserialization from file to object memory tree) and what seems to be happening is that BinaryFormatter.Deserialize() seems to be reading bytes one by one or small chunks in order to re-hydrate objects.

Here are the three most exclusive consuming functions I have:

29% Microsoft.Win32.Win32Native::ReadFile
22% Microsoft.Win32.Win32Native::SetFilePointerWin32
12% Microsoft.Win32.Win32Native::SetFilePointer

I wonder if there's a way to tell BinaryFormatter to read by chunks for ex. Tried BufferedStream and MemoryStream and no luck...

Ariel
Do you know if XmlSerializer use BinaryFormatter? If yes, so it also should be as slow as the binary one.
Kamarey
+1  A: 

http://www.codeproject.com/KB/cs/FastSerialization.aspx

This is a good customised serialisation class I am implementing on my own project, pretty easy to use. Would recommend.

Apparently 4x faster than the standard formatter, and having easy access to the internal structures means you could write in your own .NET4 parallel improvements.

JaredBroad
Thanks, looks interesting
Kamarey