views:

224

answers:

3

I have two lists (list1 and list2) containing references to some objects, where some of the list entries may point to the same object. Then, for various reasons, I am serializing these lists to two separate files. Finally, when I deserialize the lists, I would like to ensure that I am not re-creating more objects than needed. In other words, it should still be possible for some entry of List1 to point to the same object as some entry in List2.

MyObject obj = new MyObject();
List<MyObject> list1 = new ArrayList<MyObject>();
List<MyObject> list2 = new ArrayList<MyObject>();
list1.add(obj);
list2.add(obj);

// serialize to file1.ser
ObjectOutputStream oos = new ObjectOutputStream(...);
oos.writeObject(list1);
oos.close();

// serialize to file2.ser
oos = new ObjectOutputStream(...);
oos.writeObject(list2);
oos.close();

I think that sections 3.4 and A.2 of the spec say that deserialization strictly results in the creation of new objects, but I'm not sure. If so, some possible solutions might involve:

  1. Implementing equals() and hashCode() and checking references manually.
  2. Creating a "container class" to hold everything and then serializing the container class.

Is there an easy way to ensure that objects are not duplicated upon deserialization?

Thanks.

+2  A: 

You can override the readResolve() method to replace what's read from the stream with anything you want.

private Object readResolve() throws ObjectStreamException {
  ...
}

This is typically used for enforcing singletons. Prior to Java 5 it was also used for typesafe enums. I've never seen it used for this but scenario but I guess there's no reason it couldn't be.

Now this will work with individual objects that you control but I can't see how you'd make it with a List. It could ensure that the objects returned in that list aren't duplicated (by whatever criteria you deem).

cletus
This is the right approach. However, the OP needs to realize criterion for determine if objects are duplicate must be based on the values of the objects' fields ... not the objects' identities.
Stephen C
+2  A: 

After deserialization of the second list you could iterate over it's the elements and replace duplicates by a reference to the first list.

According to 3.7 The readResolve Method the readResolve() method is not invoked on the object until the object is fully constructed.

stacker
@stacker - your second sentence is true but not pertinent. The `readResolve()` method does not need to return `this`.
Stephen C
@Stephen C - The intention was to mention that using readResolve() wouldn't help to prevent instantiation of unneeded objects.
stacker
This will probably be the least painful solution.
YGL
+2  A: 

I think that sections 3.4 and A.2 of the spec say that deserialization strictly results in the creation of new objects, but I'm not sure. If so, some possible solutions might involve: ...

2, Creating a "container class" to hold everything and then serializing the container class.

I read these statements as "if I my understanding about deserialization always creating new objects is incorrect, then solution #2 of writing both lists wrapped in a container class to a single stream is an acceptable solution."

If I am understanding you correctly, this means you think writing out through a single container containing both lists won't work because it will still result in duplicate objects ("strictly results in ... new objects"). This is incorrect. When writing out the graph of objects (your wrapper class), each object is only serialized once, no matter how many occurrences in the graph. When the graph is read back in, that object is not duplicated.

http://java.sun.com/javase/6/docs/api/java/io/ObjectOutputStream.html

The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.

So, if you can, use option #2.

Creating a "container class" to hold everything and then serializing the container class.

Bert F
-1. You are neglecting the fact that there are 2 streams. The reference sharing mechanism works only within a single ObjectInputStream, not between 2 independent streams.
finnw
@finnw - Sorry, I'm not neglecting that. Yes, the asker did initially say 2 streams, but he later says, as I quoted above, `I think ... strictly results in the creation of new objects, but I'm not sure`). He goes on to say that, if his understanding was wrong (it was), that `possible solutions` include creating a wrapper and writing a single stream (`2. Creating a 'container class' to hold everything and then serializing the container class.`) I was trying to be clear quoting his text above, but I'll clarify my answer to try to clear up the confusion.
Bert F
OK, downvote removed
finnw