I need to serialize a java object which might change later on, like some of the variables can be added or removed. What are the pit falls of such an approach and What precautions should I take, if this remains the only way out.
I suppose the short answer would be that you will have to implement some sort of custom deserialization process, that will know of the changes and will deserialize older versions of an object in a correct way. You should also include the serialVersionUID field that will keep track of you version and will help you find out if a serialized object is an old version. You can read more about this here
- You definitely need to add a serialVersionUID field right from the beginning.
- Changes might make the serialized objects incompatible. Adding and removing fields can cause the violation of class contracts (up to the point of Exceptions being thrown) when deserializing instances where the field was not present in a class version that expects it to be - the field is set to the type's default value in that case; the most likely problems are
NullPointerException
s. This can be averted by implementing readObject() and writeObject(). Other changes (such as changing a field's type) can cause the deserialization to fail entirely.
When you now that your serialized object will change in the future, you should create a new serialzed Object with another namespace, instead of changing an existing one.
And adding a serialVersionUID like Michael described is also a ToDo.
As Michael pointed out Java provides some support for serialization with java.io.Serializable. The main problem with the Java support is that versioning is clunky and requires to user to deal with it.
Instead I would recommend something like Googles Protocol Buffers or Apache Thrift. For both you define the object in a very simple language and then they will generate the serialization code for you. Both also handle all the versioning for you such that you don't have to worry about if you are reading an old or a new version of the object.
For example if you have a type foo() which has a field bar and you write a bunch of foo objects to disk. Then some time later you add a field baz to foo and write a few more foo objects to disk. When you read them back they will all be foo objects, it will seem as if all of the original foo objects simply never set their baz field.