views:

2289

answers:

8

Hi!

I'm building a distributed C++ application that needs to do lots of serialization and deserialization of simple data structures that's being passed between different processes and computers.

I'm not interested in serializing complex class hierarchies, but more of sending structures with a few simple members such as number, strings and data vectors. The data vectors can often be many megabytes large. I'm worried that text/xml-based ways of doing it is too slow and I really don't want to write this myself since problems like string encoding and number endianess can make it way more complicated than it looks on the surface.

I've been looking a bit at protocol buffers and boost.serialize. According to the documents protocol buffers seems to care much about performance. Boost seems somewhat more lightweight in the sense that you don't have an external language for specifying the data format which I find quite convenient for this particular project.

So my question comes down to this: does anyone know if the boost serialization is fast for the typical use case I described above?

Also if there are other libraries that might be right for this, I'd be happy to hear about them.

+3  A: 

ACE and ACE TAO come to mind, but you might not like the size and scope of it. http://www.cs.wustl.edu/~schmidt/ACE.html

Regarding your query about "fast" and boost. That is a subjective term and without knowing your requirements (throughput, etc) it is difficult to answer that for you. Not that I have any benchmarks for the boost stuff myself...

There are messaging layers you can use, but those are probably slower than boost. I'd say that you identified a good solution in boost, but I've only used ACE and other proprietary communications/messaging products.

Tim
+1  A: 

boost.serialization doesn't care about string encodings or endianness. You'll be similarly well off not using it if that matters to you.

You might want to look into ICE from ZeroC: http://www.zeroc.com/

It works similar to CORBA, except that it's entirely specced and defined by the company. The upside is that the implementations work as intended, since there aren't all that many. The downside is that if you're using a language they don't support, you're out of luck.

Note the licensing terms: it is free for open source projects, but quite expensive for commercial applications (as most commercial CORBA ORBs anyway).
David Rodríguez - dribeas
Yes, it's expensive for commercial applications. If you factor out the hassle you'll invariably run into if you go for a CORBA or SOAP approach (mixing ORBs with different interpretations of the specs), the price is pretty good IMO :)
Good point about boost not supporting endianness. I thought the Data Portability goal on boost serialization index page implied that it actually was supported. However that point was mentioned in the todo-list for the project as well :)
Laserallan
From what I gather, boost serialization is portable as long you don't use the binary serialization technique. Appearantly both text and xml produces portable data. However, if you notice that disk space becomes an issue you might want to reconsider. I think there is a portable binary format underway
Statement
+2  A: 

Don't pre-emptively optimize. Measure first and optimize second.

plinth
I think this approach is good for things that are easily replaced. External libraries might not be that hard to change, but I definitely think it's worth doing some homework before making a decision if I can see a problem coming.
Laserallan
+1  A: 

My guess is that boost is fast enough. I have used it in previous projects to serialize data to and from disk, and its performance never even came up as an issue.

My answer here talks about serialization in general, which may be helpful to you beyond which serialization library you choose to use.

Having said that, it looks like you know most of the main trouble spots with serialization (endianess string encoding). You did leave out versioning and forwards/backwards compatibility. If time is not critical I recommend writing your own serialization code. It is an enlightening experience, and the lessons you learn are invaluable. Though I will warn you it will tend to make you hate XML based protocols for their bloatedness. :)

Whichever path you choose good luck with your project.

grieve
+3  A: 

If you are only sending well defined defined data structures, then perhaps you should be looking at ASN.1 as an encoding methodology ?

Peter M
+5  A: 

I would strongly suggest protocol buffers. They're incredibly simple to use, offer great performance, and take care of issues like endianness and backwards compatibility. To make it even more attractive, serialized data is language-independent thanks to numerous language implementations.

Bill
+2  A: 

Also check out ONC-RPC (old SUN-RPC)

Malkocoglu
We have use sun rpc to do this for 15 year very successfully. The C++ code it produces is simple to use and it works on all the OS I have tried.
David Allan Finch
A: 

There's also Thrift, which looks like an alpha project but is used and developed by Facebook, so it has a few users of it.

Or good old DCE, which was the standard MS decided to use for COM. Its now open-source, 20 years too late, but better than never.

gbjbaanb