ansaurus

Question

What are the deficiencies of the built-in BinaryFormatter based .Net serialization?

Answer 1

+2 A:

Given any random object, it's very difficult to prove whether it really is serializable.

Joel Coehoorn 2009-03-31 21:12:29

Answer 2

+1 A:

If you change the object you're serializing, all the old data you've serialized and stored is broken. If you stored in a database or even XML it is easier to convert old data to new.

RossFabricant 2009-03-31 21:12:43

This is not strictly true the change has to be breaking... http://msdn.microsoft.com/en-us/library/system.runtime.serialization.optionalfieldattribute.aspx

Sam Saffron 2009-03-31 21:52:54

Answer 3

+1 A:

Versioning of data is handled through attributes. If you aren't worried about versioning then this is no problem. If you are, it is a huge problem.

The trouble with the attribute scheme is that it works pretty slick for many trivial cases (such as adding a new property) but breaks down pretty rapidly when you try to do something like replace two enum values with a different, new enum value (or any number of common scenarios that comes with long-lived persistent data).

I could go into lots of details describing the troubles. In the end, writing your own serializer is pretty darn easy if you need to...

Jeff Kotula 2009-03-31 21:17:37

Answer 4

+1 A:

It isn't guaranteed you can serialize objects back and forth between different Frameworks (Say 1.0, 1.1, 3.5) or even different CLR Implementations (Mono), again, XML is better to this purpose.

Jhonny D. Cano -Leftware- 2009-03-31 21:17:59

Or a different binary format; see my answer...

Marc Gravell 2009-04-01 06:09:18

Answer 5

+1 A:

Another issue that came to mind:

The XmlSerializer classes are located in a completely different place from the generic run time formatters. And while they are very similar to use, the XmlSerializer does not implement the IFormatter interface. You can't have code that allows you to simply swap the serialization formatter in or out at run time between BinaryFormatter, XmlSerializer, or a custom formatter without jumping through some extra hoops.

Joel Coehoorn 2009-03-31 21:24:01

XmlSerializer is meant for a totally different purpose than the runtime serializer and the formatters. You wouldn't want to swap.

John Saunders 2009-03-31 22:39:10

Answer 6

+1 A:

Types being serialized must be decorated with the [Serializable] attribute.

If you mean variables in a class, you are wrong. Public variables/properties are automaticly serialized

PoweRoy 2009-03-31 21:30:14

I actually suspect he meant the class itself, but that's still wrong because you also have the option to implement ISerializable. Even if it's right, I don't think that's entirely a bad thing.

Joel Coehoorn 2009-03-31 21:34:02

Sorry I mean the class itself will expand that example

Sam Saffron 2009-03-31 21:40:44

It depends on the API; BinaryFormatter/SoapFormatter work against fields (public or private); XmlSerializer works against public members (fields or properties); DataContractSerializer works (ideally) against members marked [DataMember].

Marc Gravell 2009-03-31 22:40:30

Answer 7

A:

A slightly less obvious one is that performance is pretty poor for Object serialization.

Example

Time to serialize and deserialize 100,000 objects on my machine:

Time Elapsed 3 ms
Full Serialization Cycle: BinaryFormatter Int[100000]

Time Elapsed 1246 ms
Full Serialization Cycle: BinaryFormatter NumberObject[100000]

Time Elapsed 54 ms
Full Serialization Cycle: Manual NumberObject[100000]

In this simple example serializing an object with a single Int field takes 20x slower than doing it by hand. Granted, there is some type information in the serialized stream. But that hardly accounts for the 20X slowdown.

Sam Saffron 2009-03-31 21:39:35

Answer 8

+5 A:

If you mean BinaryFormatter:

being based on fields, is very version intolerant; change private implementation details and it breaks
isn't cross-compatible with other platforms
isn't very friendly towards new fields
is assembly specific (metadata is burnt in)
is MS/.NET specific (and possibly .NET version specific)
isn't obfuscation-safe
isn't especially fast or small
doesn't work on light frameworks (CF?/Silverlight)

I've spent lots of time in this area, including writing a (free) implementation of Google's "protocol buffers" serialization API for .NET; protobuf-net

This is:

smaller and faster
cross-compatible with other implementations
extensible
contract-based
obfuscation safe
assembly independent
is an open documented standard
works on all versions of .NET (caveat: not tested on Micro Framework)
has hooks to plug into ISerializable (for remoting etc) and WCF

Marc Gravell 2009-03-31 22:38:36

yerp I meant BinaryFormatter based serialization ... expanded the question to specify

Sam Saffron 2009-03-31 23:24:54

Answer 9

A:

I concur with the last answer. The performance is pretty poor. Recently, my team of coders finished converting a simulation from standard C++ to C++/CLI. Under C++ we had a hand written persistance mechanism, which worked reasonably well. We decided to use the serialization mechanism, as opposed to re-writing the old persistance mechanism.
THe old simulation with a memory footprint between 1/2 and 1 Gig and most objects having pointers to other objects, and 1000's of objects at runtime, would persist to a binary file of about 10 to 15 Meg in under a minute. Restoring from the file was comparable.
Using the same data-files (running side-by-side) the running performance of the C++/CLI is about twice the C++, until we do the persistance (serialization in the new version) Writng out takes between 3 and 5 minutes, reading in takes between 10 and 20. The file size of the serialized files is about 5 times the size as the old files, Basically we see a 19 fold increase in the read time, and a 5 fold increas in the write time. This is unacceptable and we are looking for ways to correct this.

In examining the binary files I discovered a few things: 1. The type and assembly data is written in clear text for all types. This is space-wise inefficient. 2. Every object /instance of every type has the bloated type/assembly information writen out. One thing that we did in our hand persistance mechansim was write out a known type table. As we discovered types in writing, we looked up its existance in this table. If it did not exist, an entry was created wiht the type info, and an index assigned. Then we passed the type infor as an integer. (type,data,type,data) This 'trick' would cut down on the size tremendously. This may require going through the data twice, however an 'on-the-fly' process could be developped, where-by in addition to adding it to the table, pushing to the stream, if we could guarentee order of resotration from the stream.

I was hoping to re-implement some of the core serialization to optimize it this way, but, alas the classes are sealed! We may yet find a way to jerry-rig it.

James Garner 2010-05-14 14:04:06

ansaurus

tags:

views:

answers:

What are the deficiencies of the built-in BinaryFormatter based .Net serialization?

related questions