views:

518

answers:

5

I'm building a system, with C++, that uses Tokyo Cabinet (original API in C). The problem is I want to store a class such as:

    class Entity {
      public:
        string entityName;
        short type;
        vector<another_struct> x;
        vector<another_struct> y
        vector<string> z;
    };

The problem is that vectors and strings have variable length. When I pass a void* (my object) to Tokyo Cabinet so it can store it, I also have to pass the size of the object in bytes. But that can't be trivially done.

What is the best way to determine the number of bytes of an object? Or what is the best way to store variable length objects in Tokyo Cabinet.

I'm already considering looking for serialization libs.

Thanks

+4  A: 

I think it is worse than that. The actual storage for the vectors is not contiguous with the rest of the object. You see std::vector<>s keep their data in separate allocations on the heap (so they can expand them if needed). You'll need a API that understands c++ and the STL.

In short. This isn't going to work.

dmckee
I'm also afraid of that. I think I need a serialization lib.
Felipe Hummel
+9  A: 

You cannot portably treat a non-POD C++ struct/class as a raw sequence of bytes - this is regardless of use of pointers or std::string and std::vector, though the latter virtually guarantee that it will break in practice. You need to serialize the object into a sequence of chars first - I'd suggest Boost.Serialization for a good, flexible cross-platform serialization framework.

Pavel Minaev
A: 

I've had a similar problem although I use HDF5. In my case there is an additional requirement that I can read sub-parts of the object and so serialization is not really an option.

HDF is very much like a large array where an index is used to access the data. The solution that I use is to add a "previous index" to the table that stores the another_struct type.

Taking your example, if 'x' and 'y' had 3 and 2 elements each, then the data would be stored as follows:

[ index ] [ another_struct data here ] [ previous_index ]
[   0   ] [       x data 0           ] [ -1 ]
[   1   ] [       x data 1           ] [  0 ]
[   2   ] [       x data 2           ] [  1 ]
[   3   ] [       y data 0           ] [ -1 ]
[   4   ] [       y data 1           ] [  3 ]

And then, in the main Entity table, the last index added is stored:

[ index ] [ Entity data here ] [ x ] [  y ]
[   0   ] [        ...       ] [ 2 ] [  4 ]

I'm not that familiar with how Tokyo Cabinet works so although this approach should work, it may not be optimal for that data format. Ideally, if you can have pointers to real Tokyo Cabinet objects, then rather than using indexes as I have above you could store those pointers.

Richard Corden
A: 

yes, you'd better to use boost serialization or protobuf to sterilize the object and put it into Cabinet

frank tang
A: 

I use Protocol Buffers to store my C++ objects as Tokyo Cabinet data values.

In Protocol Buffers, you specify the structure and than generate the marshalling/unmarshalling code for C++, Python, and Java. In your case the .proto file would look like:

message Entity {
    optional string entityName = 1;
    optional int32 type = 2; //protobuf has no short
    short type = 3;
    repeated AnotherStruct x = 4;
    repeated AnotherStruct y = 5;
    repeated string z = 6;
};

Especially if the data base exists over a long timespan, a system that can be updated, e.g. to cover new fields is very value. In contrast to XML and other, protobuf is quite fast.

dmeister