Serialization is needed whenever an object needs to be persisted or transmitted beyond the scope of its existence.
Persistence is the ability to save an object somewhere and load it later with the same state. For example:
- You might need to store an object instance on disk as part of a file.
- You might need to store an object in a database as a blob (binary large object).
Transmission is the ability to send an object outside of its original scope to some receiver. For example:
- You might need to transmit an instance of an object to a remote machine.
- You might need to transmit an instance to another AppDomain or process on the same machine.
For each of these, there must be some serial bit representation that can be stored, communicated, and then later used to reconstitute the original object. The process of turning an object into this series of bits is called "serialization", while the process of turning the series of bits into the original object is called "deserialization".
The actual representation of the object in serialized form can differ depending on what your goals are. For example, in C#, you have both XML serialization (via the XmlSerializer
class) and binary serialization (through use of the BinaryFormatter
class). Depending on your needs, you can even write your own custom serializer to do additional work such as compression or encryption. If you need a language- and platform-neutral serialization format, you can try Google's Protocol Buffers which now has support for .NET (I have not used this).
The XML representation mentioned above is good for storing an object in a standard format, but it can be verbose and slow depending on your needs. The binary representation saves on space but isn't as portable across languages and runtimes as XML is. The important point is that the serializer and deserializer must understand each other. This can be a problem when you start introducing backward and forward compatibility and versioning.
An example of potential serialization compatibility issues:
- You release version 1.0 of your program which is able to serialize some
Foo
object to a file.
- The user does some action to save his
Foo
to a file.
- You release version 2.0 of your program with an updated
Foo
.
- The user tries to open the version 1.0 file with your version 2.0 program.
This can be troublesome if the version 2.0 Foo
has additional properties that the version 1.0 Foo
didn't. You have to either explicitly not support this scenario or have some versioning story with your serialization. .NET can do some of this for you. In this case, you might also have the reverse problem: the user might try to open a version 2.0 Foo
file with version 1.0 of your program.
I have not used these techniques myself, but .NET 2.0 and later has support for version tolerant serialization to support both forward and backward compatibility:
- Tolerance of extraneous or unexpected data. This enables newer versions of the type to send data to older versions.
- Tolerance of missing optional data. This enables older versions to send data to newer versions.
- Serialization callbacks. This enables intelligent default value setting in cases where data is missing.