views:

60

answers:

1

Background:

I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.

[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]

I have an implemntation that works on Github:

src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp

But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,

Questions:

  • I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
  • In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?

Edit:

  • @codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
  • @Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!
+2  A: 

Don't use new []/delete[].

Instead us a std::vector as deallocation is guaranteed in the event of exceptions.

Don't assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.

Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.

Martin York
Martin - thank you. I implemented the first 2 changes, and I'm working on the 3rd. Do you have any comment on the use of `reinterpret_cast`? And do you believe reading fstream into a std::vector<char> is the best design choice given the ParseFromArray / ParseFromIstream interface of protobuf?
Nicholas Palko
Personally I have no problem with reinterpret_cast<> in this situation. I think it aids in documentation and provides a reader with the required information that this is a dangerous cast. BUT: I know that I am in the minority on this subject and most people would advice you to use static_cast as it has a defined meaning by the standard while reinterpret_cast is implementation defined.
Martin York
Reading into vector. I can not see that you have any choice. Short of branching your own version of protocol buffer or (getting an appropriate API change accepted into the main branch). It would have been nice to get a buffer from the message and write directly into that (or provide a max length to the stream reading API).
Martin York
Thanks again! I also broke up glob into iglob/oglob. This was a great learning experience for me, so thanks again for your time!
Nicholas Palko