views:

448

answers:

3

I'm trying to read / write multiple Protocol Buffers messages from files, in both C++ and Java. Google suggests writing length prefixes before the messages, but there's no way to do that by default (that I could see).

However, the Java API in version 2.1.0 received a set of "Delimited" I/O functions which apparently do that job:

parseDelimitedFrom
mergeDelimitedFrom
writeDelimitedTo

Are there C++ equivalents? And if not, what's the wire format for the size prefixes the Java API attaches, so I can parse those messages in C++?

A: 

You can use getline for reading a string from a stream, using the specified delimiter:

istream& getline ( istream& is, string& str, char delim );

(defined in the header)

Jan
Not the same thing; protocol buffers is a binary format, the "Delimited" functions actually just prepend a size. I'd need to know the format of the size prefix.
tzaman
+2  A: 

Okay, so I haven't been able to find top-level C++ functions implementing what I need, but some spelunking through the Java API reference turned up the following, inside the MessageLite interface:

void writeDelimitedTo(OutputStream output)
/*  Like writeTo(OutputStream), but writes the size of 
    the message as a varint before writing the data.   */

So the Java size prefix is a (Protocol Buffers) varint!

Armed with that information, I went digging through the C++ API and found the CodedStream header, which has these:

bool CodedInputStream::ReadVarint32(uint32 * value)
void CodedOutputStream::WriteVarint32(uint32 value)

Using those, I should be able to roll my own C++ functions that do the job.

They should really add this to the main Message API though; it's missing functionality considering Java has it, and so does Marc Gravell's excellent protobuf-net C# port (via SerializeWithLengthPrefix and DeserializeWithLengthPrefix).

tzaman
Yes. This is the way I solved this problem. I added another answer with some sample pseudo code for writing a message.
Yukiko
+1  A: 

I solved the same problem using CodedOutputStream/ArrayOutputStream to write the message (with the size) and CodedInputStream/ArrayInputStream to read the message (with the size).

For example, the following pseudo-code writes the message size following by the message:

const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;

google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);

codedOutput.WriteLittleEndian32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);

When writing you should also check that your buffer is large enough to fit the message (including the size). And when reading, you should check that your buffer contains a whole message (including the size).

It definitely would be handy if they added convenience methods to C++ API similar to those provided by the Java API.

Yukiko
I'll be using an underlying `OstreamOutputStream`, so the length-checking won't be necessary, but thanks for the answer. :) In your case, I'd probably go with setting the `bufLength` to `protoMessage.ByteSize()` plus some extra for the size prefix.
tzaman