views:

93

answers:

2

I inherited the following code (and data stored using it, i.e. serialized instances of A):

class A implements Serializable {
  private static final long serialVersionUID = 1L;
  int someField;
  B b;
}

class B implements Serializable {
  private static final long serialVersionUID = 1L;
  int someField;
}

At some point I realized that the b field in A should not actually be persisted (and B shouldn't be serializable at all), so I changed things to:

class A implements Serializable {
  private static final long serialVersionUID = 1L;
  int someField;
  transient B b;
}

class B {
  int someField;
}

If I make a new instance of A and serialize it, I have no trouble deserializing it. However, if I try to deserialize instances of A that were stored with the old code, I get an exception of the form:

java.io.InvalidClassException: B; B; class invalid for deserialization
  at java.io.ObjectStreamClass.checkDeserialize(Unknown Source)
  at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
  at java.io.ObjectInputStream.readObject0(Unknown Source)
  at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
  at java.io.ObjectInputStream.readSerialData(Unknown Source)
  at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
  at java.io.ObjectInputStream.readObject0(Unknown Source)
  at java.io.ObjectInputStream.readObject(Unknown Source)

I believe this is because the persisted data also has A's and B's class descriptions stored, and those still think that b is persisted, even if in the current version they no longer are (see lines 600 to 606 in ObjectStreamClass)

Is there a way to force A's deserialization to skip over fields that are now transient? For example, is there a way to override the class description that the deserialization code reads in ObjectInputStream and update its definition of b so that it knows its transient?

+2  A: 

There is no trick to skip over certain fields during deserialization. In fact, (almost) any change you make to the source code of the class will make old serialized data impossible to deserialize. Serialization couples your source code very tightly to the serialized data.

This is why serialization is not suited for long-term data storage. Serialization is only suited for things like RMI (transporting objects over the network) or temporary storage on disk. Use a well-documented (standard) file format instead of Java serialization for long-term data storage.

What you can do is deserialize the data using the old code, then write it into another format, and from then on use only that format.

Jesper
I ended up writing a "fix up" task that read and wrote back the data with class definitions for A/B that were backwards/forwards-compatible (the b field was transient, but B was still marked as implementing Serializable), and then I was able to remove the Serializable implementation from B, now that none of the persisted instances of A had B's in them.
Mihai Parparita
A: 

This is why it is a bad idea to use serialization for long term object storage. In your case, I think leaving B as Serializable (even if the reference in A is transient) is enough to work around your problem.

In addition, you will probably want to implement your own readObject method to null out B in the event that it comes in with old data.

I don't know if it is possible to implement a custom readObject method that would actually restore the class in the face of such an error. It would certainly be horribly messy.

Yishai