views:

166

answers:

2

I'm writing my own IFormatter implementation and I cannot think of a way to resolve circular references between two types that both implement ISerializable.

Here's the usual pattern:

[Serializable]
class Foo : ISerializable
{
    private Bar m_bar;

    public Foo(Bar bar)
    {
        m_bar = bar;
        m_bar.Foo = this;
    }

    public Bar Bar
    {
        get { return m_bar; }
    }

    protected Foo(SerializationInfo info, StreamingContext context)
    {
        m_bar = (Bar)info.GetValue("1", typeof(Bar));
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        info.AddValue("1", m_bar);
    }
}

[Serializable]
class Bar : ISerializable
{
    private Foo m_foo;

    public Foo Foo
    {
        get { return m_foo; }
        set { m_foo = value; }
    }

    public Bar()
    { }

    protected Bar(SerializationInfo info, StreamingContext context)
    {
        m_foo = (Foo)info.GetValue("1", typeof(Foo));
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        info.AddValue("1", m_foo);
    }
}

I then do this:

Bar b = new Bar();
Foo f = new Foo(b);
bool equal = ReferenceEquals(b, b.Foo.Bar); // true

// Serialise and deserialise b

equal = ReferenceEquals(b, b.Foo.Bar);

If I use an out-of-the-box BinaryFormatter to serialise and deserialise b, the above test for reference-equality returns true as one would expect. But I cannot conceive of a way to achieve this in my custom IFormatter.

In a non-ISerializable situation I can simply revisit "pending" object fields using reflection once the target references have been resolved. But for objects implementing ISerializable it is not possible to inject new data using SerializationInfo.

Can anyone point me in the right direction?

A: 

You need to detect that you have used the same object more than once in your object graph, tag each object in the output, and when you come to occurance #2 or higher, you need to output a "reference" to an existing tag instead of the object once more.

Pseudo-code for serialization:

for each object
    if object seen before
        output tag created for object with a special note as "tag-reference"
    else
        create, store, and output tag for object
        output tag and object

Pseudo-code for deserialization:

while more data
    if reference-tag to existing object
        get object from storage keyed by the tag
    else
        construct instance to deserialize into
        store object in storage keyed by deserialized tag
        deserialize object

It is important that you do the last steps there in the order they're specified, so that you can correct handle this case:

SomeObject obj = new SomeObject();
obj.ReferenceToSomeObject = obj;    <-- reference to itself

ie. you cannot store the object into your tag-storage after you've completely deserialized it, since you might need a reference to it in the storage while you are deserializing it.

Lasse V. Karlsen
I understand your point about "reference-tags" and my formatter uses this technique already. Thus your self-referencing example is not a problem for me. But I don't see how your answer helps me with ISerializable-implementing objects that reference each other. Are you able to address this specific issue? Thanks.
Chris
Not entirely sure what you mean. Are you talking about using the private constructor related to serialization?
Lasse V. Karlsen
Yes, exactly. The only way to inflate an object that implements ISerializable is to call its special constructor.
Chris
I don't know how that works. Either you would "deserialize" a surrogate object (don't know how to do that), or you would have to construct an empty object just in order to get an object reference, before you call the actual constructor (don't know how to do that either.)
Lasse V. Karlsen
Judging by the way the BinaryFormatter behaves, an object reference is created and passed around as needed. Subsequently the ISerializable constructor is called on the same object reference. No idea how this is being done, though.
Chris
Actually, I do. There's an overloaded form of ConstructorInfo.Invoke() that supports a target.
Chris
+1  A: 

This situation is the reason for the FormatterServices.GetUninitializedObject method. The general idea is that if you have objects A and B which reference each other in their SerializationInfo, you can deserialize them as follows:

(For the purposes of this explanation, (SI,SC) refers to a type's deserialization constructor, i.e. the one which takes a SerializationInfo and a StreamingContext.)

  1. Pick one object to deserialize first. It shouldn't matter which you pick, as long as you don't pick one which is a value-type. Lets say you pick A.
  2. Call GetUninitializedObject to allocate (but not initialize) an instance of A's type, because you're not yet ready to call its (SI,SC) constructor.
  3. Build B in the usual way, i.e. create a SerializationInfo object (which will include the reference to the now half-deserialized A) and pass it to B's (SI,SC) constructor.
  4. Now you have all the dependencies you need to initialize your allocated A object. Create it's SerializationInfo object and call A's (SI,SC) constructor. You can call a constructor on an existing instance via reflection.

The GetUninitializedObject method is pure CLR magic - it creates an instance without ever calling a constructor to initialize that instance. It basically sets all fields to zero/null.

This is the reason you are cautioned not to use any of the members of a child object in a (SI,SC) constructor - a child object may be allocated but not yet initialized at that point. It is also the reason for the IDeserializationCallback interface, which gives you a chance to use your child objects after all object initialization is guaranteed to be done and before the deserialized object graph is returned.

The ObjectManager class can do all of this (and other types of fix-ups) for you. However, I've always found it to be quite under-documented given the complexity of deserialization, so I never spent the time to try figure out how to use it properly. It uses some more magic to do step 4 using some internal-to-the-CLR reflection optimized to call the (SI,SC) constructor quicker (I've timed it at about twice as fast as the public way).

Finally, there are object graphs involving cycles which are impossible to deserialize. One example is when you have a cycle of two IObjectReference instances (I've tested BinaryFormatter on this and it throws an exception). Another I suspect is if you have a cycle involving nothing but boxed value-types.

Wesley Hill