My present task is to dissect tcpdump data that includes P2P messages and I am having trouble with the piece data I acquire and write to a file on my x86 machine. My suspicion is I have a simple endian-ness issue with the bytes I write to to file.
I have a list of bytes holding a piece of P2P video read and processed using python-pcapy package BTW.
bytes = [14, 254, 23, 35, 34, 67, etc... ]
I am looking for a way to store these bytes, presently held in a list in my Python application to a file.
Currently I write the pieces as follows:
def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts):
file = open(filename,"ab")
# Iterate through bytes writing them to a file if don't have piece already
if not self.piecemap[ipdst].has_key(pieceindex):
for byte in bytes:
file.write('%c' % byte)
file.flush()
self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))
# Remember we have this piece now in case duplicates arrive
self.piecemap[ipdst][pieceindex] = True
# TODO: Collect stats
file.close()
As you can see from the for loop, I write the bytes to the file in the same order as I process them from the wire (i.e. network or big-endian order).
Suffice to say, the video which is the payload of the pieces does not playback well in VLC :-D
I think I need to convert them to little-endian byte order but am not sure the best way to approach this in Python.
UPDATE
The solution that worked out for me (writing P2P pieces handling endian issues appropriately) was:
def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts):
file = open(filename,"r+b")
if not self.piecemap[ipdst].has_key(pieceindex):
little = struct.pack('<'+'B'*len(bytes), *bytes)
# Seek to offset based on piece index
file.seek(pieceindex * self.piecesize)
file.write(little)
file.flush()
self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))
# Remember we have this piece now in case duplicates arrive
self.piecemap[ipdst][pieceindex] = True
file.close()
The key to the solution was usage of Python struct module as suspected and in particular:
little = struct.pack('<'+'B'*len(bytes), *bytes)
Thanks to those who responded with helpful suggestions.