views:

102

answers:

2

Hello.

How do you 'de-serialize' a derived class from serialized data? Or maybe I should say, is there a better way to 'de-serialize' data into derived classes?

For example, suppose you had a pure virtual base class (B) that is inherited by three other classes, X, Y and Z. Moreover, we have a method, serialize(), that will translate X:B, Y:B and Z:B into serialized data.

This way it can be zapped across a socket, a named pipe, etc. to a remote process.

The problem I have is, how do we create an appropriate object from the serialized data?

The only solution I can come up with is including an identifier in the serialized data that indicates the final derived object type. Where the receiver, first parses the derived type field from the serialized data, and then uses a switch statement (or some sort of logic like that) to invoke the appropriate constructor.

For example:

B deserialize( serial_data )
{
    parse the derived type from the serial_data

    switch (derived type)
        case X
            return X(serial_data)
        case Y
            return Y(serial_data)
        case Z
            return Z(serial_data)
}

So after learning the derived object type we invoke the appropriate derived type constructor.

However, this feels awkward and cumbersome. I'm hoping there is a more eloquent way of doing this. Is there?

A: 
inmemory:
--------
type1 {
  chartype a;
  inttype b;
};
serialize(new type1());

serialized(ignore { and ,):
---------------------------
type1id,len{chartypeid,adata,inttypeid,bdata}

i guess, in an ideal serialization protocol, every non-primitive type need to be prefixed with typeid,len. Even if you serialize a single type that is not derived from anything, you would add a type id, because the other end has to know what type its getting (regardless of inheritance structure). So you have to mention derived class ids in the serialization, because logically they are different types. Correct me if i am wrong.

Seeker
This is sort of the direction I was heading, but I don't think length is needed. Of course it would depend upon how the typeid is defined. I was envisioning it as identifying the type of class, which would then infer the length too. But I suspect my thinking is also based on the fact that I'm using POSIX Message Queues for transmission so the transport layer knows the length.
John Rocha
+2  A: 

In fact, it's a more general issue than serialization called Virtual Constructor.

The traditional approach is to a Factory, which based on an ID returns the right derived type. There are two solutions:

  • the switch method as you noticed, though you need to allocate on the heap
  • the prototype method

The prototype method goes like so:

// Cloneability
class Base
{
public:
  virtual Base* clone() const = 0;
};

class Derived: public Base
{
public:
  virtual Derived* clone() const { return new Derived(*this); }
};

// Factory
class Factory
{
public:
  Base* get(std::string const& id) const;
  void set(std::string const& id, Base* exemplar);

private:
  typedef std::map < std::string, Base* > exemplars_type;
  exemplars_type mExemplars;
};

It is somewhat traditional to make the Factory a singleton, but it's another matter entirely.

For deserialization proper, it's easier if you have a virtual method deserialize to call on the object.

EDIT: How does the Factory work ?

In C++ you can't create a type you don't know about. The idea above is therefore that the task of building a Derived object is given to the Derived class, by way of the clone method.

Next comes the Factory. We are going to use a map which will associate a "tag" (for example "Derived") to an instance of an object (say Derived here).

Factory factory;
Derived derived;
factory.set("Derived", &derived);

Now, when we want to create an object which type we don't know at compile time (because the type is decided on the fly), we pass a tag to the factory and ask for an object in return.

std::unique_ptr<Base> base = factory.get("Derived");

Under the cover, the Factory will find the Base* associated to the "Derived" tag and invoke the clone method of the object. This will actually (here) create an object of runtime-type Derived.

We can verify this by using the typeid operator:

assert( typeid(base) == typeid(Derived) );
Matthieu M.
Pardon my ignorance. But I'm not catching how a Factory solves the problem. In fact, I fell off the truck at the factory class above.
John Rocha
No problem, we are all here to learn, I've added an explanation on how the factory solves the virtual construction issue. Don't get confused by the `map` structure, it's just a fancy switch that can be filled up at runtime.
Matthieu M.
Thank you for the additional information. So if I understand this correctly. We still need some sort of tag to tell us what type of object it is. The factory uses this tag to find the "registered" function that returns a new clone.Correct me if I am wrong, but this looks like a "dynamic" switch statement. Instead of having a hard coded switch like I have above, the factory gives you a "flexible" way to provide the same functionality? Or am I missing something.BTW, THANKS! I've already adapted this solution for another problem I was having!
John Rocha
@John Rocha: you're absolutely correct! You need a tag to for this "dynamic" switch statement. It's not much different from your solution, the only advantage is the dynamic bit, because you don't have to modify the same file (switch) each time you add a new type to your collection.
Matthieu M.