For various reasons I have a custom serialization where I am dumping some fairly simple objects to a data file. There are maybe 5-10 classes, and the object graphs that result are acyclic and pretty simple (each serialized object has 1 or 2 references to another that are serialized). For example:
class Foo
{
final private long id;
public Foo(long id, /* other stuff */) { ... }
}
class Bar
{
final private long id;
final private Foo foo;
public Bar(long id, Foo foo, /* other stuff */) { ... }
}
class Baz
{
final private long id;
final private List<Bar> barList;
public Baz(long id, List<Bar> barList, /* other stuff */) { ... }
}
The id field is just for the serialization, so that when I am serializing to a file, I can write objects by keeping a record of which IDs have been serialized so far, then for each object checking whether its child objects have been serialized and writing the ones that haven't, finally writing the object itself by writing its data fields and the IDs corresponding to its child objects.
What's puzzling me is how to assign id's. I thought about it, and it seems like there are three cases for assigning an ID:
- dynamically-created objects -- id is assigned from a counter that increments
- reading objects from disk -- id is assigned from the number stored in the disk file
- singleton objects -- object is created prior to any dynamically-created object, to represent a singleton object that is always present.
How can I handle these properly? I feel like I'm reinventing the wheel and there must be a well-established technique for handling all the cases.
clarification: just as some tangential information, the file format I am looking at is approximately the following (glossing over a few details which should not be relevant). It's optimized to handle a fairly large amount of dense binary data (tens/hundreds of MB) with the ability to intersperse structured data in it. The dense binary data makes up 99.9% of the file size.
The file consists of a series of error-corrected blocks which serve as containers. Each block can be thought of as containing a byte array which consists of a series of packets. It is possible to read the packets one at a time in succession (e.g. it's possible to tell where the end of each packet is, and the next one starts immediately afterwards).
So the file can be thought of as a series of packets stored on top of an error-correcting layer. The vast majority of these packets are opaque binary data that has nothing to do with this question. A small minority of these packets, however, are items containing serialized structured data, forming a sort of "archipelago" consisting of data "islands" which may be linked by object reference relationships.
So I might have a file where packet 2971 contains a serialized Foo, and packet 12083 contains a serialized Bar that refers to the Foo in packet 2971. (with packets 0-2970 and 2972-12082 being opaque data packets)
All these packets are all immutable (and therefore given the constrains of Java object construction, they form an acyclic object graph) so I don't have to deal with mutability issues. They are also descendents of a common Item
interface. What I would like to do is write an arbitrary Item
object to the file. If the Item
contains references to other Item
s that are already in the file, I need to write those to the file too, but only if they haven't been written yet. Otherwise I will have duplicates that I will need to somehow coalesce when I read them back.