views:

21

answers:

1

Greetings,

I am working on distributed pub-sub system expected to have minimum latency. I am now having to choose between using serialization/deserialization and raw data buffer. I prefer raw data approach as there's almost no overhead, which keeps the latency low. But my colleague argues that I should use marshaling as the parser will be less complex and buggier. I stated my concern about latency but he said it's gonna be worth it in the long run and there's even FPGA devices to accelerate.

What's your opinions on this?

TIA!

A: 

Using a 'raw data' approach, if hardcoded in one language, for one platform, has a few problems when you try to write code on another platform in another language (or sometimes even the same language/platform for a different compiler, if your fields don't have natural alignment).

I recommend using an IDL to describe your message formats. If you pick one that is reducible to 'raw data' in your language of choice, then the generated code to access message fields for that language won't do anything but access variables in a structure that is memory overlayed on your data buffer, but it still represents the metadata in a platform and language neutral way, so parsers can make more complicated code for other platforms.

The downside of picking something that is reducible to a C structure overlay is that it doesn't handle optional fields, and doesn't handle variable-length arrays, and may not handle backwards compatible extensions in the future (unless you just tack them on the end of the structure). I'd suggest you read about google's protocol buffers if you haven't yet.

KeyserSoze
thanks for your answer. I actually read this interesting article: " http://mnb.ociweb.com/mnb/MiddlewareNewsBrief-201004.html " To me the extra latency introduced by Google protocol buffer and boost serialization is pretty big, especially the latter. That's why I favored "raw data" over the other, but you are definitely right about cross-platform and multi-language issues.