views:

324

answers:

8

Here is a contrived example of how a lot of our classes return binary representations (to be read by C++) of themselves.

def to_binary(self):
    'Return the binary representation as a string.'
    data = []

    # Binary version number.
    data.append(struct.pack('<I', [2]))

    # Image size.
    data.append(struct.pack('<II', *self.image.size))

    # Attribute count.
    data.append(struct.pack('<I', len(self.attributes)))

    # Attributes.
    for attribute in self.attributes:

        # Id.
        data.append(struct.pack('<I', attribute.id))

        # Type.
        data.append(struct.pack('<H', attribute.type))

        # Extra Type.        
        if attribute.type == 0:
            data.append(struct.pack('<I', attribute.typeEx))

    return ''.join(data)

What I dislike:

  • Every line starts with data.append(struct.pack(, distracting from the unique part of the line.
  • The byte order ('<') is repeated over and over again.
  • You have to remember to return the boilerplate ''.join(data).

What I like:

  • The format specifiers appear near the attribute name. E.g., it's easy to see that self.image.size is written out as two unsigned ints.
  • The lines are (mostly) independent. E.g., To remove the Id field from an 'attribute', you don't have to touch more than one line of code.

Is there a more readable/pythonic way to do this?

+3  A: 

You can try to implement some sort of declarative syntax for your data.

Which may result in something like:

class Image(SomeClassWithMetamagic):
    type = PackedValue(2)
    attribute = PackedValue('attributes') # accessed via self.__dict__

#or using decorators
    @pack("<II")
    def get_size():
        pass

#and a generic function in the Superclass
    def get_packed():
        stuff

etc...

Other examples would be SQLAlchemy's declarative_base, ToscaWidgets and sprox

ebo
The declarative syntax is good if you don't need complex programmatic logic to build the serialization (i.e. lots of ifs and fors). I have used the declarative approach to specify the serialization, deserialization and automatically generated documentation in one go for a binary fileformat.
Ants Aasma
A: 

You could refactor your code to wrap boilerplate in a class. Something like:

def to_binary(self):
    'Return the binary representation as a string.'
    binary = BinaryWrapper()

    # Binary version number.
    binary.pack('<I', [2])

    # alternatively, you can pass an array
    stuff = [
        ('<II', *self.image.size),          # Image size.
        ('<I', len(self.attributes)),       # Attribute count
    ]
    binary.pack_all(stuff)

    return binary.get_packed()
gabor
+1  A: 
def to_binary(self):
    struct_i_pack = struct.Struct('<I').pack
    struct_ii_pack = struct.Struct('<II').pack
    struct_h_pack = struct.Struct('<H').pack
    struct_ih_pack = struct.Struct('<IH').pack
    struct_ihi_pack = struct.Struct('<IHI').pack

    return ''.join([
        struct_i_pack(2),
        struct_ii_pack(*self.image.size),
        struct_i_pack(len(self.attributes)),
        ''.join([
            struct_ih_pack(a.id, a.type) if a.type else struct_ihi_pack(a.id, a.type, a.typeEx)
            for a in attributes
        ])
    ])
mtasic
+3  A: 
from StringIO import StringIO
import struct

class BinaryIO(StringIO):
    def writepack(self, fmt, *values):
        self.write(struct.pack('<' + fmt, *values))

def to_binary_example():
    data = BinaryIO()
    data.writepack('I', 42)
    data.writepack('II', 1, 2)
    return data.getvalue()
Dave
A: 

The worst problem is that you need corresponding code in C++ to read the output. Can you reasonably arrange to have both the reading and writing code mechanically derive from or use a common specification? How to go about that depends on your C++ needs as much as Python.

Darius Bacon
A: 

You can get rid of the repetition while still as readable easily like this:

def to_binary(self):     
    output = struct.pack(
        '<IIII', 2, self.image.size[0], self.image.size[1], len(self.attributes)
    )
    return output + ''.join(
        struct.pack('<IHI', attribute.id, attribute.type, attribute.typeEx)
        for attribute in self.attributes
    )
Nadia Alramli
I think you missed "if attribute.type == 0:"
Darius Bacon
+2  A: 

If you just want nicer syntax, you can abuse generators/decorators:

from functools import wraps    

def packed(g):
  '''a decorator that packs the list data items
     that is generated by the decorated function
  '''
  @wraps(g)
  def wrapper(*p, **kw):
    data = []
    for params in g(*p, **kw):
      fmt = params[0]
      fields = params[1:]
      data.append(struct.pack('<'+fmt, *fields))
    return ''.join(data)    
  return wrapper

@packed
def as_binary(self):
  '''just |yield|s the data items that should be packed
     by the decorator
  '''
  yield 'I', [2]
  yield 'II', self.image.size[0], self.image.size[1]
  yield 'I', len(self.attributes)

  for attribute in self.attributes:
    yield 'I', attribute.id
    yield 'H', attribute.type
    if attribute.type == 0:
      yield 'I', attribute.typeEx

Basically this uses the generator to implement a "monad", an abstraction usually found in functional languages like Haskell. It separates the generation of some values from the code that decides how to combine these values together. It's more a functional programming approach then "pythonic", but I think it improves readability.

sth
+1.I was literally just seconds away of posting the exact same solution. One small improvement to enhance readability would be to encapsulate the datatype string in a function so yield 'I', attribute.id becomes yield UInt(attribute.id).
Ants Aasma
+2  A: 

How about protocol buffers google's extensive cross language format and protocol of sharing data.

jb