Background:
I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
- I'm using
reinterpret_cast<char*>
to read/write the 32 bit integers to and from the binaryfstream
. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that anint
is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binaryfstream
given these two limiting assumptions? - In reading from
fstream
, I create a temporary fixed-lengthchar
buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode usingParseFromArray
, asParseFromIstream
will consume the entire stream. I'd really prefer just to tell the library to read at most the nextN
bytes fromfstream
, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of anfstream
? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
- @codymanix: I'm casting to
char
sinceistream::read
requires achar
array if I'm not mistaken. I'm also not using the extraction operator>>
since I read it was poor form to use with binary streams. Or is this last piece of advice bogus? - @Martin York: Removed
new
/delete
in favor ofstd::vector<char>
.glob.cpp
is now updated. Thanks!