ansaurus

Question

Deserializing data from file. Performance issue

Answer 1

A:

I've found serialization to be quite an overhead, and I'd expect deserializing this much data from a file to take a lot longer than querying it from a database. You're also reading the data from a file on the disk so that's going to be expensive as well. If you're trying to cache data, it's best to look at some in memory option.

Charlie 2009-08-27 09:16:02

After deserializing the data will not be written to disk.

Kamarey 2009-08-27 09:22:40

Sorry, was a typo in my answer. I meant you read the data from the disk, then deserialize it. You want to store it in a file on disk rather than a database and that's going to be more of an overhead. Basically, it sounds like you should stick to a database as your datastore. Otherwise you might want to cache portions of your data in memory, but not by serializing and writing out to a file.

Charlie 2009-08-27 11:44:57

Answer 2

A:

I have noticed that it takes quite a while for .NET to initialize XmlSerializers. So if you are not reusing the serializer objects, doing this should speed up the process significantly.

Skrim 2009-08-27 09:20:32

This is binary serialization, not xml one. But may be this relates to binary serializer also...

Kamarey 2009-08-27 09:24:04

Answer 3

+2 A:

Because of the way the Binary Serializer works, it is painfully slow. It injects a lot of reflection-based metadata into the binary file. I ran some tests against some rather large structures a few years back and found that the XMLSerializer is both smaller and faster than the binary serializer. Go figure!

In either case, the serialization is done via reflection, which is slow. You might consider your own serialization mechanism.

I once created my own binary serialization mechanism (using file write/read), and it performed 20 times faster than the XML serializer, which performed faster than the binary serializer. It was also significantly smaller.

You might want to consider doing something like that.

Brian Genisio 2009-08-27 11:45:16

Answer 4

A:

I concur with Brian's post. If write your own persistence logic you can eliminate the overhead of reflection calls and have full control over how data is loaded from disk. You'll have to write more code, but in this case that could be the price of optimisation.

Andy Holt 2009-08-27 12:31:06

Answer 5

A:

Have you profiled your specific scenario?

Comments are assuming that issue is reflection, but I'm now working on a similar scenario (deserialization from file to object memory tree) and what seems to be happening is that BinaryFormatter.Deserialize() seems to be reading bytes one by one or small chunks in order to re-hydrate objects.

Here are the three most exclusive consuming functions I have:

29% Microsoft.Win32.Win32Native::ReadFile
22% Microsoft.Win32.Win32Native::SetFilePointerWin32
12% Microsoft.Win32.Win32Native::SetFilePointer

I wonder if there's a way to tell BinaryFormatter to read by chunks for ex. Tried BufferedStream and MemoryStream and no luck...

Ariel 2009-11-02 12:37:44

Do you know if XmlSerializer use BinaryFormatter? If yes, so it also should be as slow as the binary one.

Kamarey 2009-11-05 16:40:18

Answer 6

+1 A:

http://www.codeproject.com/KB/cs/FastSerialization.aspx

This is a good customised serialisation class I am implementing on my own project, pretty easy to use. Would recommend.

Apparently 4x faster than the standard formatter, and having easy access to the internal structures means you could write in your own .NET4 parallel improvements.

JaredBroad 2010-01-11 06:34:40

Thanks, looks interesting

Kamarey 2010-01-11 10:15:48

ansaurus

tags:

views:

answers:

Deserializing data from file. Performance issue

related questions