ansaurus

Question

java: assigning object reference IDs for custom serialization

Answer 1

+1 A:

Are the foos registered with a FooRegistry? You could try this approach (assume Bar and Baz also have registries to acquire the references via the id).

This probably has lots of syntax errors, usage errors, etc. But I feel the approach is a good one.

public class Foo {

public Foo(...) {
    //construct
    this.id = FooRegistry.register(this);
}

public Foo(long id, ...) {
    //construct
    this.id = id;
    FooRegistry.register(this,id);
}

}

public class FooRegistry() { Map foos = new HashMap...

long register(Foo foo) {
    while(foos.get(currentFooCount) == null) currentFooCount++;
    foos.add(currentFooCount,foo);
    return currentFooCount;
}

void register(Foo foo, long id) {
    if(foo.get(id) == null) throw new Exc ... // invalid
    foos.add(foo,id);
}

}

public class Bar() {

void writeToStream(OutputStream out) {
    out.print("<BAR><id>" + id + "</id><foo>" + foo.getId() + "</foo></BAR>");
}

}

public class Baz() {

void.writeToStream(OutputStream out) {
    out.print("<BAZ><id>" + id + "</id>");
    for(Bar bar : barList) out.println("<bar>" + bar.getId() + </bar>");
    out.print("</BAZ>");
}

}

glowcoder 2010-06-08 14:29:42

Answer 2

+2 A:

Do you really need to do this? Internally, the ObjectOutputStream tracks which objects have been serialized already. Subsequent writes of the same object only store a internal reference (similar to writing out just the id) rather than writing out the whole object again.

See Serialization Cache for more details.

If the IDs correspond to some externally defined identity, such as an entity ID, then this makes sense. But the question states that the IDs are generated purely to track which objects are serialized.

You can handle singletons via the readResolve method. A simple approach is to compare the freshly deserialized instance with your singleton instances, and if there is a match, return the singleton instance rather than the deserialized instance. E.g.

   private Object readResolve() {
      return (this.equals(SINGLETON)) ? SINGLETON : this;
      // or simply
      // return SINGLETON;
   }

EDIT: In response to the comments, the stream is mostly binary data (stored in an optimized format) with complex objects indispersed in that data. This can be handled by using a stream format that supports substreams, e.g. zip, or a simple block chunking. E.g. the stream can be a sequence of blocks:

offset 0  - block type
offset 4  - block length N
offset 8  - N bytes of data
...
offset N+8  start of next block

You can then have blocks for binary data, blocks for serialized data, blocks for XStream serialized data etc. Since each block knows it's size you can create a substream to read up to that length from the place in the file. This allows you to freely mix data without concerns for parsing.

To implement a stream, have your main stream parse the blocks, e.g.

   DataInputStream main = new DataInputStream(input);
   int blockType = main.readInt();
   int blockLength = main.readInt();
   // next N bytes are the data
   LimitInputStream data = new LimitInputStream(main, blockLength);

   if (blockType==BINARY) {
      handleBinaryBlock(new DataInputStream(data));
   }
   else if (blockType==OBJECTSTREAM) {
      deserialize(new ObjectInputStream(data));
   }
   else
      ...

A sketch of LimitInputStream looks like this:

public class LimitInputStream extends FilterInputStream
{
   private int bytesRead;
   private int limit;
   /** Reads up to limit bytes from in */
   public LimitInputStream(InputStream in, int limit) {
      super(in);
      this.limit = limit;
   }

   public int read(byte[] data, int offs, int len) throws IOException {
      if (len==0) return 0; // read() contract mandates this
      if (bytesRead==limit)
         return -1;
      int toRead = Math.min(limit-bytesRead, len);
      int actuallyRead = super.read(data, offs, toRead);
      if (actuallyRead==-1)
          throw new UnexpectedEOFException();
      bytesRead += actuallyRead;
      return actuallyRead;
   }

   // similarly for the other read() methods

   // don't propagate to underlying stream
   public void close() { }
}

mdma 2010-06-08 16:23:53

+1 for making the point.... Do I really need to do this? I'd love to use some facility built into the JRE, but there are so many differences between ObjectOutputStream and what I'm doing that I don't know how to link the two together. My serialization is closer to XML serialization.

Jason S 2010-06-08 17:18:11

Have you tried XStream - http://xstream.codehaus.org. It's serialization but based on XML. Very pluggable. It also uses a serialization cache - references to already serialized objects are written out as references in XML, either referring to an automatically generated id, or using XPath to refer to the original element that defined the object. Well worth a look.

mdma 2010-06-08 17:26:09

I actually did take a look a few minutes before posting a comment. My problem in this particular case, is that I need to intersperse a few complex objects among a large set of binary-encoded raw data bytes that need to be stored in an optimized way since they use 99.9% of the file's space and I'm expecting files in the 10-100MB range. So I can't use XML... all I have are a bunch of disconnected islands among a larger data stream.

Jason S 2010-06-08 17:32:03

XStream allows you to completely replace the actual file format, so you could use FastInfoset or some other binary standard. I'm assuming that your file format allows you to get hold of the data islands, and treat this as "substreams" of the main stream. Then you could store whatever you want in there, XML, FastInfoSet, protocol buffers etc. Just because the rest of your file is optimized binary, doesn't mean that all of it has to be. You can use chunking to split the data islands from the remainder of the stream. I'll elaborate more in my answer.

mdma 2010-06-08 17:41:58

dumb question... how do you implement a substream?

Jason S 2010-06-08 18:15:46

(e.g. each block does know its own length, I'm doing that)

Jason S 2010-06-08 18:16:27

Not a dumb question - I've upaded my answer.

mdma 2010-06-08 18:39:15

OK, I see what you're getting at. Maybe "islands" was a bad term; really what I have is a data "archipelago". I will update my question to clarify.

Jason S 2010-06-08 19:33:55

Accepted... I ended up keeping the data as "islands" and using Google gson to encode each one in a JSON notation. I have the possibility of duplicating some of the objects in the data file, but they're such a small part of the file size that it doesn't matter for file size, and if I care about object graph equivalence, I can coalesce multiple copies of equivalent objects upon reading them out from the file.

Jason S 2010-06-09 19:12:44

This sounds good. I was going to propose extending ObjectOutputStream so that it writes the data packets that belong to an object after streaming the object. This will then preserve the object graph, with no duplicates, while allowing each object to write out the data that belongs to it.

mdma 2010-06-09 19:18:39

Answer 3

A:

I feel like I'm reinventing the wheel and there must be a well-established technique for handling all the cases.

Yes, looks like default object serialization would do, or ultimately you're pre-optimizing.

You can change the format of the serialized data ( like the XMLEncoder does ) for a more convenient one.

But if you insist, I think the singleton with dynamic counter should do, but don't put the id, in the public interface for the constructor:

class Foo {
    private final int id;
    public Foo( int id, /*other*/ ) { // drop the int id
    }
 }

So the class could be a "sequence" and probably a long would be more appropriate to avoid have problems with the Integer.MAX_VALUE.

Using an AtomicLong as described in the java.util.concurrent.atomic package ( to avoid having two threads assign the same id, or to avoid excessive synchronization ) would help too.

class Sequencer {
    private static AtomicLong sequenceNumber = new AtomicLong(0);
    public static long next() { 
         return sequenceNumber.getAndIncrement();
    }
}

Now in each class you have

 class Foo {
      private final long id;
      public Foo( String name, String data, etc ) {
          this.id = Sequencer.next();
      }
 }

And that's it.

( note, I don't remember if deserializing the object invokes the constructor, but you get the idea )

OscarRyz 2010-06-08 18:16:09

??? this is confusing... you have Sequencer as a class with non-static methods, but you are invoking Sequencer.next() as though next is static method. Also, I appreciate the help but I know how to do what you are saying to instantiate a counter; my question is more along the lines of how to manage *either* a counter-based assignment *or* read-back from the file *or* a static singleton. I can't use just one approach for constructors

Jason S 2010-06-08 18:32:22

my bad I updated with the `static` for the sequencer...

OscarRyz 2010-06-08 19:52:27

ansaurus

tags:

views:

answers:

java: assigning object reference IDs for custom serialization

related questions