views:

652

answers:

5

Hello

I am trying to decode data received over a tcp connection. The packets are small, no more than 100 bytes. However when there is a lot of them I receive some of the the packets joined together. Is there a way to prevent this. I am using python

I have tried to separate the packets, my source is below. The packets start with STX byte and end with ETX bytes, the byte following the STX is the packet length, (packet lengths less than 5 are invalid) the checksum is the last bytes before the ETX

def decode(data):
  while True:
    start = data.find(STX)
    if start == -1: #no stx in message
        pkt = ''
        data = ''
        break
    #stx found , next byte is the length
    pktlen = ord(data[1])
    #check message ends in ETX (pktken -1) or checksum invalid
    if pktlen < 5 or data[pktlen-1] != ETX or checksum_valid(data[start:pktlen]) == False:
        print "Invalid Pkt"
        data = data[start+1:]
        continue
    else:
        pkt = data[start:pktlen]
        data = data[pktlen:]
        break

return data , pkt

I use it like this

#process reports
try:
    data = sock.recv(256) 
except: continue 
else:
    while data:
        data, pkt = decode(data) 
        if pkt:
           process(pkt)

Also if there are multiple packets in the data stream, is it best to return the packets as a collection of lists or just return the first packet

I am not that familiar with python, only C, is this method OK. Any advice would be most appreciated. Thanks in advance

Thanks

+3  A: 

TCP provides a data stream, not individual packets, at the interface level. If you want discrete packets, you can use UDP (and handle lost or out of order packets on your own), or put some data separator inline. It sounds like you are doing that already, with STX/ETX as your separators. However, as you note, you get multiple messages in one data chunk from your TCP stack.

Note that unless you are doing some other processing, data in the code you show does not necessarily contain an integral number of messages. That is, it is likely that the last STX will not have a matching ETX. The ETX will be in the next data chunk without an STX.

You should probably read individual messages from the TCP data stream and return them as they occur.

mpez0
Thanks mpez0, could you elaborate on the last line of your reply. Do you mean that if I have data that has say three packets, I should return the (1) first packet found and (2) the data - the first packetThen call the subroutine again until there are no packets remaining in the data.Thanks
mikip
Yes. Combine the read from TCP and initial parse from the data stream in one routine that can handle splitting of your messages between TCP reads. Call that routine to get the next message (or, if you prefer, a list of available messages) or a flag return for no messages available.Not sure if that's the best or typical Python idiom, but it will work.
mpez0
A: 

Where does the data come from ? Instead of trying to decode it by hand, why not use the excellent Impacket package:

http://oss.coresecurity.com/projects/impacket.html

fraca7
I want to use Python for this, I need to subsequently process the data
mikip
@mikip, visit the link... impacket is a Python solution. Do you mean "pure Python" for some reason? Better explain why then...
Peter Hansen
Impacket *is* pure Python anyway. It's pcapy that uses a C extension, but here mikip seems to already have done the capture part.
fraca7
Hi Peter, Had a look at impacket, looks far too complicated for me at the moment. yes I would prefer to use pure pythonThansk
mikip
+3  A: 

Try scapy, a powerful interactive packet manipulation program.

Oli
want to use Python for this, I need to subsequently process the data
mikip
@mikip, so what do you think the "py" in "scapy" stands for? ;-)
Peter Hansen
+3  A: 

I would create a class that is responsible for decoding the packets from a stream, like this:

class PacketDecoder(object):

    STX = ...
    ETX = ...

    def __init__(self):
        self._stream = ''

    def feed(self, buffer):
        self._stream += buffer

    def decode(self):
        '''
        Yields packets from the current stream.
        '''
        while len(self._stream) > 2:
            end = self._stream.find(self.ETX)
            if end == -1:
                break

            packet_len = ord(self._stream[1])
            packet = self._stream[:end]
            if packet_len >= 5 and check_sum_valid(packet):
                yield packet
            self._stream = self._stream[end+1:]

And then use like this:

decoder = PacketDecoder()
while True:
    data = sock.recv(256) 
    if not data:
        # handle lost connection... 
    decoder.feed(data)
    for packet in decoder.decode():
        process(packet)
Bruno Oliveira
Thanks Bruno, I dont fully understand it yet as I'm new to python and not familiar with generators. However it is an elegant solution
mikip
A: 

Nice and simple... :) The trick is in the file object.

f=sock.makefile()
while True:
  STX = f.read(1)
  pktlen = f.read(1)
  wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
  doSomethingWithPacket(wholePacket)

And that's it! (There is also no need to check checksums when using TCP.)

And here is a more "robust"(?) version (it uses STX and checksum):

f=sock.makefile()
while True:
  while f.read(1)!=STX:
    continue
  pktlen = f.read(1)
  wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
  if checksum_valid(wholePacket):
    doSomethingWithPacket(wholePacket)
Kalmi