views:

148

answers:

3

My present task is to dissect tcpdump data that includes P2P messages and I am having trouble with the piece data I acquire and write to a file on my x86 machine. My suspicion is I have a simple endian-ness issue with the bytes I write to to file.

I have a list of bytes holding a piece of P2P video read and processed using python-pcapy package BTW.

bytes = [14, 254, 23, 35, 34, 67, etc... ]

I am looking for a way to store these bytes, presently held in a list in my Python application to a file.

Currently I write the pieces as follows:

def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts): 
    file = open(filename,"ab")
    # Iterate through bytes writing them to a file if don't have piece already 
    if not self.piecemap[ipdst].has_key(pieceindex):
        for byte in bytes: 
            file.write('%c' % byte)
        file.flush()
        self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))

    # Remember we have this piece now in case duplicates arrive 
    self.piecemap[ipdst][pieceindex] = True

    # TODO: Collect stats 
    file.close()

As you can see from the for loop, I write the bytes to the file in the same order as I process them from the wire (i.e. network or big-endian order).

Suffice to say, the video which is the payload of the pieces does not playback well in VLC :-D

I think I need to convert them to little-endian byte order but am not sure the best way to approach this in Python.

UPDATE

The solution that worked out for me (writing P2P pieces handling endian issues appropriately) was:

def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts): 
    file = open(filename,"r+b")
    if not self.piecemap[ipdst].has_key(pieceindex):
        little = struct.pack('<'+'B'*len(bytes), *bytes) 
        # Seek to offset based on piece index 
        file.seek(pieceindex * self.piecesize)
        file.write(little)
        file.flush()
        self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))

    # Remember we have this piece now in case duplicates arrive 
    self.piecemap[ipdst][pieceindex] = True

    file.close()

The key to the solution was usage of Python struct module as suspected and in particular:

    little = struct.pack('<'+'B'*len(bytes), *bytes) 

Thanks to those who responded with helpful suggestions.

A: 

This may have been answered previously in Python File Slurp w/ endian conversion.

Steven Rumbalski
Thanks for this reference. The numpy-based method described is very concise and speedy.
landstatic
+2  A: 

To save yourself some work you might like to use a bytearray (Python 2.6 and later):

b = [14, 254, 23, 35]
f = open("file", 'ab')
f.write(bytearray(b))

This does all the converting of your 0-255 values into bytes without the need for all the looping.

I can't see what your problem is otherwise without more information. If the data really is byte-wise then endianness isn't an issue, as others have said.

(By the way, using bytes and file as variable names isn't good as it hide the built-ins of the same name).

Scott Griffiths
Thanks for the advice, noted.
landstatic
I am limited to Python 2.5.4 for this project. Thanks for the insight into bytearray though Scott.
landstatic
+1  A: 

You can also use an array.array:

from array import array
f.write(array('B', bytes))

instead of

f.write(struct.pack('<'+'B'*len(bytes), *bytes))

which when tidied up a little is

f.write(struct.pack('B' * len(bytes), *bytes))
# the < is redundant; there is NO ENDIANNESS ISSUE

which if len(bytes) is "large" might be better as

f.write(struct.pack('%dB' % len(bytes), *bytes)) 
John Machin