views:

62

answers:

1

What chances do I have to instantiate, keep and serialize/deserialize to/from binary data Python classes reflecting this pattern (adopted from RFC 2246 [TLS]):

   enum { apple, orange } VariantTag;
   struct {
       uint16 number;
       opaque string<0..10>; /* variable length */
   } V1;
   struct {
       uint32 number;
       opaque string[10];    /* fixed length */
   } V2;
   struct {
       select (VariantTag) { /* value of selector is implicit */
           case apple: V1;   /* VariantBody, tag = apple */
           case orange: V2;  /* VariantBody, tag = orange */
       } variant_body;       /* optional label on variant */
   } VariantRecord;

Basically I would have to define a (variant) class VariantRecord, which varies depending on the value of VariantTag. That's not that difficult. The challenge is to find a most generic way to build a class, which serializes/deserializes to and from a byte stream... Pickle, Google protocol buffer, marshal is all not an option.

I made little success with having an explicit "def serialize" in my class, but I'm not very happy with it, because it's not generic enough.

I hope I could express the problem.

My current solution in case VariantTag = apple would look like this, but I don't like it too much

import binascii
import struct

class VariantRecord(object):
  def __init__(self, number, opaque):
    self.number = number
    self.opaque = opaque
  def serialize(self):
    out = struct.pack('>HB%ds' % len(self.opaque), self.number, len(self.opaque), self.opaque)
    return out


v = VariantRecord(10, 'Hello')
print binascii.hexlify(v.serialize())

>> 000a0548656c6c6f

Regards

A: 

Two suggestions:

  1. For the variable length structure use a fixed format and just slice the result.
  2. Use struct.Struct

e.g. If I've understood your formats correctly (is the length byte that appeared in your example but wasn't mentioned originally present in the other variant also?)

>>> import binascii
>>> import struct
>>> V1 = struct.Struct(">H10p")
>>> V2 = struct.Struct(">L10p")
>>> def serialize(variant, n, s):
    if variant:
        return V2.pack(n,s)
    else:
        return V1.pack(n,s)[:len(s)+3]


>>> print binascii.hexlify(serialize(False, 10, 'hello')) #V1
000a0568656c6c6f
>>> print binascii.hexlify(serialize(True, 10, 'hello')) #V2
0000000a0568656c6c6f00000000
>>> 
Duncan
Hmm. Thanks for the hint, but V1 isn't serialized correctly. The "<0..10>" notation has been adopted from the TLS RFC. It is required to preceed each variable length array of bytes by an element indicating the _current_ length of the structure. In case V1 the expected result would have to look like in my sample.
neil
I can't see how my V1 result differs from yours (except that I used a lower case h in 'hello'). The current length is included in the result: that's what the 'p' format does in Struct.
Duncan
Ah, ok. Wasn't aware of the "p". Thanks
neil