ansaurus

Question

C++ serialization of complex data using Boost

Answer 1

+3 A:

Read this FAQ! Does that help to get started?

dirkgently 2009-02-16 21:34:56

Answer 2

+1 A:

I don't particularly want to serialize the whole map if it's not necessary, as it's really only the contents I'm after.

Does that mean you don't really need to serialize the whole object? Maybe you should reconsider just using a text-based format. If you really need to serialize only a subset of the key/value pairs in a map then you should probably just write them to a text file and read them in later. You don't necessarily need XML; just one line per map key followed by one line with the value should work.

2009-02-16 22:03:27

Answer 3

+1 A:

If all you want is key value pairs then the important thing is the types the keys and values take, this will colour how you deal with things.

Serialising the map itself would be a poor plan in general since you may wish to change your associative container type later but not invalidate (or have to translate) previous serialised files.

Serialising the container can be useful in certain circumstances if you wish to avoid the cost of rebuilding the container again (but pre-sizing the container is normally sufficient to avoid the vast majority of this overhead) but this should be a decision based on specific aspects of your application and usage.

If you supply the type of the key/values we can help more. without this here are some general tips:

If they are amenable to string representation then a simple CSV file may be sufficient (but use an existing reader writer library for it, reading and writing legit CSV is harder than it looks superficially)
IF they are fixed width then a simple binary format will make reading and writing very easy (and quick) but care should be taken to acknowledge the issues of:
- endianess
- whether you wish to allow simple catting of such files together or add CRC like values for integrity (you can do both but it's harder)
- You lose the ability to grep the files (this is a real loss, you may end having to reinvent parts of your toolchain for this)
- whether changing platform/compiler/size_t will break the format
Some structured textual format that is lighter than XML. There are several JSOM/YAML etc. These will provide extensibility you quite likely don't require.

ShuggyCoUk 2009-02-16 22:49:11

Apologies, I missed out putting in my original answer that I intended on leaving xml entirely as I felt standard ascii / binary would suffice (have now edited the question). Thanks for your points though, there's some useful information there.

Dan 2009-02-18 23:38:11

Answer 4

+1 A:

Use Google's Protocol Buffers which is a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

There are bindings for C++, Java, Python, Perl, C#, and Ruby.

You describe your data in metadata .proto files

message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;
}

Then you would use it in C++ like this:

Person person;
person.set_id(123);
person.set_name("Bob");
person.set_email("[email protected]");

fstream out("person.pb", ios::out | ios::binary | ios::trunc);
person.SerializeToOstream(&out);
out.close();

Or like this:

Person person;
fstream in("person.pb", ios::in | ios::binary);
if (!person.ParseFromIstream(&in)) {
  cerr << "Failed to parse person.pb." << endl;
  exit(1);
}

cout << "ID: " << person.id() << endl;
cout << "name: " << person.name() << endl;
if (person.has_email()) {
  cout << "e-mail: " << person.email() << endl;
}

For a more complete example, see the tutorials.

chrish 2009-02-20 21:13:22

Answer 5

+2 A:

There are many advantages to boost.serialization. For instance, as you say, just including a method with a specified signature, allows the framework to serialize and deserialize your data. Also, boost.serialization includes serializers and readers for all the standard STL containers, so you don't have to bother if all keys have been stored (they will) or how to detect the last entry in the map when deserializing (it will be detected automatically).

There are, however, some considerations to make. For example, if you have a field in your class that it is calculated, or used to speed-up, such as indexes or hash tables, you don't have to store these, but you have to take into account that you have to reconstruct these structures from the data read from the disk.

As for the "file format" you mention, I think some times we try to focus in the format rather than in the data. I mean, the exact format of the file don't matter as long as you are able to retrieve the data seamlessly using (say) boost.serialization. If you want to share the file with other utilities that don't use serialization, that's another thing. But just for the purposes of (de)serialization, you don't have to care about the internal file format.

Diego Sevilla 2009-02-23 12:36:35

ansaurus

tags:

views:

answers:

C++ serialization of complex data using Boost

related questions