views:

897

answers:

9

What are the deficiencies of the built-in BinaryFormatter based .Net serialization? (Performance, flexibility, restrictions)

Please accompany your answer with some code if possible.

Example:

Custom objects being serialized must be decorated with the [Serializable] attribute or implement the ISerializable interface.

Less obvious example:

Anonymous types can not be serialized.

+2  A: 

Given any random object, it's very difficult to prove whether it really is serializable.

Joel Coehoorn
+1  A: 

If you change the object you're serializing, all the old data you've serialized and stored is broken. If you stored in a database or even XML it is easier to convert old data to new.

RossFabricant
This is not strictly true the change has to be breaking... http://msdn.microsoft.com/en-us/library/system.runtime.serialization.optionalfieldattribute.aspx
Sam Saffron
+1  A: 

Versioning of data is handled through attributes. If you aren't worried about versioning then this is no problem. If you are, it is a huge problem.

The trouble with the attribute scheme is that it works pretty slick for many trivial cases (such as adding a new property) but breaks down pretty rapidly when you try to do something like replace two enum values with a different, new enum value (or any number of common scenarios that comes with long-lived persistent data).

I could go into lots of details describing the troubles. In the end, writing your own serializer is pretty darn easy if you need to...

Jeff Kotula
+1  A: 

It isn't guaranteed you can serialize objects back and forth between different Frameworks (Say 1.0, 1.1, 3.5) or even different CLR Implementations (Mono), again, XML is better to this purpose.

Jhonny D. Cano -Leftware-
Or a different binary format; see my answer...
Marc Gravell
+1  A: 

Another issue that came to mind:

The XmlSerializer classes are located in a completely different place from the generic run time formatters. And while they are very similar to use, the XmlSerializer does not implement the IFormatter interface. You can't have code that allows you to simply swap the serialization formatter in or out at run time between BinaryFormatter, XmlSerializer, or a custom formatter without jumping through some extra hoops.

Joel Coehoorn
XmlSerializer is meant for a totally different purpose than the runtime serializer and the formatters. You wouldn't want to swap.
John Saunders
+1  A: 

Types being serialized must be decorated with the [Serializable] attribute.

If you mean variables in a class, you are wrong. Public variables/properties are automaticly serialized

PoweRoy
I actually suspect he meant the class itself, but that's still wrong because you also have the option to implement ISerializable. Even if it's right, I don't think that's entirely a bad thing.
Joel Coehoorn
Sorry I mean the class itself will expand that example
Sam Saffron
It depends on the API; BinaryFormatter/SoapFormatter work against fields (public or private); XmlSerializer works against public members (fields or properties); DataContractSerializer works (ideally) against members marked [DataMember].
Marc Gravell
A: 

A slightly less obvious one is that performance is pretty poor for Object serialization.

Example

Time to serialize and deserialize 100,000 objects on my machine:

Time Elapsed 3 ms
Full Serialization Cycle: BinaryFormatter Int[100000]

Time Elapsed 1246 ms
Full Serialization Cycle: BinaryFormatter NumberObject[100000]

Time Elapsed 54 ms
Full Serialization Cycle: Manual NumberObject[100000]

In this simple example serializing an object with a single Int field takes 20x slower than doing it by hand. Granted, there is some type information in the serialized stream. But that hardly accounts for the 20X slowdown.

Sam Saffron
+5  A: 

If you mean BinaryFormatter:

  • being based on fields, is very version intolerant; change private implementation details and it breaks
  • isn't cross-compatible with other platforms
  • isn't very friendly towards new fields
  • is assembly specific (metadata is burnt in)
  • is MS/.NET specific (and possibly .NET version specific)
  • isn't obfuscation-safe
  • isn't especially fast or small
  • doesn't work on light frameworks (CF?/Silverlight)

I've spent lots of time in this area, including writing a (free) implementation of Google's "protocol buffers" serialization API for .NET; protobuf-net

This is:

  • smaller and faster
  • cross-compatible with other implementations
  • extensible
  • contract-based
  • obfuscation safe
  • assembly independent
  • is an open documented standard
  • works on all versions of .NET (caveat: not tested on Micro Framework)
  • has hooks to plug into ISerializable (for remoting etc) and WCF
Marc Gravell
yerp I meant BinaryFormatter based serialization ... expanded the question to specify
Sam Saffron
A: 

I concur with the last answer. The performance is pretty poor. Recently, my team of coders finished converting a simulation from standard C++ to C++/CLI. Under C++ we had a hand written persistance mechanism, which worked reasonably well. We decided to use the serialization mechanism, as opposed to re-writing the old persistance mechanism.
THe old simulation with a memory footprint between 1/2 and 1 Gig and most objects having pointers to other objects, and 1000's of objects at runtime, would persist to a binary file of about 10 to 15 Meg in under a minute. Restoring from the file was comparable.
Using the same data-files (running side-by-side) the running performance of the C++/CLI is about twice the C++, until we do the persistance (serialization in the new version) Writng out takes between 3 and 5 minutes, reading in takes between 10 and 20. The file size of the serialized files is about 5 times the size as the old files, Basically we see a 19 fold increase in the read time, and a 5 fold increas in the write time. This is unacceptable and we are looking for ways to correct this.

In examining the binary files I discovered a few things: 1. The type and assembly data is written in clear text for all types. This is space-wise inefficient. 2. Every object /instance of every type has the bloated type/assembly information writen out. One thing that we did in our hand persistance mechansim was write out a known type table. As we discovered types in writing, we looked up its existance in this table. If it did not exist, an entry was created wiht the type info, and an index assigned. Then we passed the type infor as an integer. (type,data,type,data) This 'trick' would cut down on the size tremendously. This may require going through the data twice, however an 'on-the-fly' process could be developped, where-by in addition to adding it to the table, pushing to the stream, if we could guarentee order of resotration from the stream.

I was hoping to re-implement some of the core serialization to optimize it this way, but, alas the classes are sealed! We may yet find a way to jerry-rig it.

James Garner