views:

160

answers:

2

Ok, the context is some serialization / deserialization code that will parse a byte stream into an 'object' representation that's easier to work with (and vice-versa).

Here's a simplified example with a base message class and then depending on a 'type' header, some more data/function are present and we must choose the right subclass to instantiate :

class BaseMessage {
public:
    enum Type {
        MyMessageA = 0x5a,
        MyMessageB = 0xa5,
    };

    BaseMessage(Type type) : mType(type) { }
    virtual ~BaseMessage() { }

    Type type() const { return mType; } 

protected:
    Type mType;

    virtual void parse(void *data, size_t len);
};

class MyMessageA {
public:
    MyMessageA() : BaseMessage(MyMessageA) { }

    /* message A specific stuf ... */

protected:
    virtual void parse(void *data, size_t len);
};

class MyMessageB {
public:
    MyMessageB() : BaseMessage(MyMessageB) { }

    /* message B specific stuf ... */

protected:
    virtual void parse(void *data, size_t len);
};

In a real examples, there would be hundreds of different message types and possibly several level or hierarchy because some messages share fields/functions with each other.

Now, to parse a byte string, I'm doing something like :

BaseMessage *msg = NULL;
Type type = (Type)data[0];

switch (type) {
    case MyMessageA:
        msg = new MyMessageA();
        break;

    case MyMessageB:
        msg = new MyMessageB();
        break;

    default:
        /* protocol error */
}

if (msg)
    msg->parse(data, len);

But I don't find this huge switch very elegant, and I have the information about which message has which 'type value' twice (once in the constructor, one in this switch) It's also quite long ...

I'm looking for a better way that would just be better ... Does anyone has any idea how to improve this ?

+8  A: 

One way of approaching it would be using a map and register some kind of factory function for each message type. This means that you get rid of the switch case and can add and remove messages dynamically.

The code would look something like:

// Create the map (most likely a member in a different class)
std::map<BaseMessage::Type, MessageCreator*> messageMap;
...

// Register some message types
// Note that you can add and remove messages at runtime here
messageMap[BaseMessage::MyMessageA] = new MessageCreatorT<BaseMessageA>();
messageMap[BaseMessage::MyMessageB] = new MessageCreatorT<BaseMessageB>();
...

// Handle a message
std::map<Type, MessageCreator*>::const_iterator it = messageMap.find(msgType);
if(it == messageMap.end()) {
    // Unknown message type
    beepHang();
}
// Now create the message
BaseMessage* msg = it->second.createMessage(data);

The MessageCreator class would look something like this:

class MessageCreator {
    public:
    virtual BaseMessage* createMessage(void* data, size_t len) const = 0;
};
template<class T> class MessageCreatorT : public MessageCreator {
    public:
    BaseMessage* createMessage(void* data, size_t len) const {
        T* newMessage = new T();
        newMessage.parse(data, len);
        return newMessage;
    }
};
Laserallan
Very interesting approach indeed. The runtime alteration aspect isn't really useful in my specific case but it's a nice bonus that could prove useful in some other cases.I'll wait a little see if someone else is before confirming as answered.
246tNt
Obviously you cannot add message types at runtime. Anyway, the switch case has been transformed to one line per message type registration entry into the map, don't see the big win here, certainly no less amount of resulting code, more efficient? Less clutter probably?
Murali VP
I'm not very concerned performance wise, I'm not processing hundreds of thousands of messages per second, the code style is more the concern here.Also the win I see here (even if not obvious from the question), is that I can also add other methods than the 'createMessage' if I need other thing of the sort, like defining which 'control class' will handle the message and stuff.For the runtime alteration: You can modify the map at runtime. To add new messages you could dynload them from shared objects like plugins and such.
246tNt
One issue remaining is that I still need to have the 'type' repeated in the Message{A,B} constructor (when building a message not from deserialization). Unless I go through the map for this too but this may not be as pretty in the code.
246tNt
+3  A: 

It's a pretty basic question in fact (as you can imagine, you are definitely not the only one deserializing in C++).

What you are looking for is called Virtual Construction.

C++ does not define Virtual Construction, but it's easy to approximate it using the Prototype Design Pattern or using a Factory method.

I personnally prefer the Factory approach, for the reason that the Prototype one means having some kind of default instance that is replicated and THEN defined... the problem is that not all classes have a meaningful default, and for that matter, a meaningful Default Constructor.

The Factory approach is easy enough.

  • You need a common base class for the Messages, and another for the Parsers
  • Each Message has both a Tag and an associated Parser

Let's see some code:

// Framework
class Message
{
public:
  virtual ~Message();
};

class Parser
{
public:
  virtual ~Parser();
  virtual std::auto_ptr<Message> parse(std::istream& serialized) const;
};

// Factory of Messages
class MessageFactory
{
public:
  void register(std::string const& tag, Parser const& parser);
  std::auto_ptr<Message> build(std::string const& tag, std::istream& serialized) const;
private:
  std::map<std::string,Parser const*> m_parsers;
};

And with this framework (admittedly simple), some derived classes:

class MessageA: public Message
{
public:
  MessageA(int a, int b);
};

class ParserA: public Parser
{
public:
  typedef std::auto_ptr<MessageA> result_type;
  virtual result_type parse(std::istream& serialized) const
  {
    int a = 0, b = 0;
    char space = 0;
    std::istream >> a >> space >> b;
    // Need some error control there
    return result_type(new MessageA(a,b));
  }
};

And at last, the use:

int main(int argc, char* argv[])
{
  // Register the parsers
  MessageFactory factory;
  factory.register("A", ParserA());

  // take a file
  // which contains 'A 1 2\n'
  std::ifstream file = std::ifstream("file.txt");
  std::string tag;
  file >> tag;
  std::auto_ptr<Message> message = factory.parse(tag, file);

  // message now points to an instance of MessageA built by MessageA(1,2)
}

It works, I know for I use it (or a variation).

There are some things to consider:

  • You may be willing to make MessageFactory a singleton, this then allows it to be called at library load, and thus you can register your parsers by instantiating static variables. This is very handy if you don't want main to have to register every single parser type: locality > less dependencies.
  • The tags have to be shared. It is not unusual either for the tag to be served by a virtual method of the Message class (called tag).

Like:

class Message
{
public:
  virtual ~Message();
  virtual const std::string& tag() const = 0;
  virtual void serialize(std::ostream& out) const;
};
  • The logic for serialization has to be shared too, it is not unusual for an object to handle its own serialization/deserialization

Like:

class MessageA: public Message
{
public:
  static const std::string& Tag();
  virtual const std::string& tag() const;
  virtual void serialize(std::ostream& out) const;

  MessageA(std::istream& in);
};

template <class M>
class ParserTemplate: public Parser // not really a parser now...
{
public:
  virtual std::auto_ptr<M> parse(std::istream& in) const
  {
    return std::auto_ptr<M>(new M(in));
  }
};

What's great with templates is that it never stops to amaze me

class MessageFactory
{
public:
  template <class M>
  void register()
  {
    m_parsers[M::Tag()] = new ParserTemplate<M>();
  }
};

//skipping to registration
  factory.register<MessageA>();

Now isn't it pretty :) ?

Matthieu M.