views:

583

answers:

3

I need to serialize a .NET DateTime value in a protocol buffers message.

My plan is to use the DateTime.ToBinary() and then pass the 64-bit returned value in the message. But I'm not sure what to choose as a protocol buffers data type to represent that.

I guess I'm confused about when the fixed64 (or sfixed64) data types should be used.

I'm assuming in this scenario I would use the signed types since the values returned by DateTime.ToBinary() can be negative as well as positive.

A: 

You should use a signed 64-bit number, not because the DateTime can be negative, but because the ToBinary method returns an Int64, which is a signed 64-bit number.

Guffa
Thanks for the answer Guffa, but that doesn't really answer my question. I'm interested in whether or not the "fixed64" data types should be used in this case. With an explanation as to why if possible!
Miky Dinescu
+2  A: 

Well, you definitely want either int64 or sfixed64 to cope with the value being signed.

Having just done a quick test, DateTime.Now.ToBinary() is encoded in 10 bytes using int64 whereas sfixed64 will always use 8 bytes. Basically, the variable length encoding is great for small numbers, but becomes bigger than the fixed encoding for large numbers. (It's the same kind of tradeoff as using UTF-8 instead of UTF-16 - ASCII characters can be encoded in UTF-8 in a single byte, but later on code points end up being encoded as 2 and then 3 bytes, whereas UTF-16 always uses a 2 bytes.)

My guess is that DateTime.ToBinary() values are likely to be quite large (without knowing the details of exactly what it does) so sfixed64 is more appropriate.

Does that make sense?

Jon Skeet
So basically the sfixed64 is a better fit in this case because it uses fixed-length encoding vs. sint64 which is variable-length and therefore suited for smaller values. Am I right?
Miky Dinescu
Yup, that's exactly right.
Jon Skeet
+1  A: 

In protobuf-net, I use a graduated scale approach (and indeed, it handles all this for you if you simply use DateTime) - the equivalent .proto is something like this:

message DateTime {
  optional sint64 value = 1; // the offset (in units of the selected scale)
                             // from 1970/01/01
  optional TimeSpanScale scale = 2 [default = DAYS]; // the scale of the
                                                     // timespan
  enum TimeSpanScale {
    DAYS = 0;
    HOURS = 1;
    MINUTES = 2;
    SECONDS = 3;
    MILLISECONDS = 4;

    MINMAX = 15; // dubious
  }
}

i.e. if the DateTime can be expressed in whole days, I just send the number of days since 1970, etc - plus a small marker to the scale. This means that dates can be sent a bit more efficiently, but it doesn't really cost much more for other scales.

Personally, I wouldn't use ToBinary() - I would explicitly use an offset of a known scale from a known epoch (such as the unix epoch). This makes it more portable between platforms. But if you are sending (for example) just the millisecond offset, then a fixed scale would usually be more efficient than a variant-length scale. Whether you need signed or unsigned depends on whether you need dates before your epoch ;-p

Marc Gravell
Marc, thank you for the insight. In fact I had already moved on to an approach very similar to yours (using the linux epoch) but I still was curious as to the correct usage of the fixed64/sfixed64 types of protocol buffers. I guess you could say my question was a bit academic because I was really interested in the applicability of the various types available to encode data with Protocol Buffers.
Miky Dinescu