tags:

views:

204

answers:

7

Yesterday I had to parse a very simple binary data file - the rule is, look for two bytes in a row that are both 0xAA, then the next byte will be a length byte, then skip 9 bytes and output the given amount of data from there. Repeat to the end of the file.

My solution did work, and was very quick to put together (even though I am a C programmer at heart, I still think it was quicker for me to write this in Python than it would have been in C) - BUT, it is clearly not at all Pythonic and it reads like a C program (and not a very good one at that!)

What would be a better / more Pythonic approach to this? Is a simple FSM like this even still the right choice in Python?

My solution:

#! /usr/bin/python

import sys

f = open(sys.argv[1], "rb")

state = 0

if f:
    for byte in f.read():
        a = ord(byte)       
        if state == 0:
            if a == 0xAA:
                state = 1
        elif state == 1:
            if a  == 0xAA:
                state = 2
            else: 
                state = 0
        elif state == 2:
            count = a;
            skip = 9
            state = 3
        elif state == 3:
            skip = skip -1
            if skip == 0:
                state = 4
        elif state == 4:
             print "%02x" %a
             count = count -1 
             if count == 0:
                 state = 0
                 print "\r\n"
A: 

I think your solution looks fine, except you should replace count = count - 1 with count -= 1.

This is one of those times where fancy code-show-offs will come up ways of have dicts mapping states to callables, with a small driver function, but it isn't better, just fancier, and using more obscure language features.

Ned Batchelder
+2  A: 

The coolest way I've seen to implement FSMs in Python has to be via generators and coroutines. See this Charming Python post for an example. Eli Bendersky also has an excellent treatment of the subject.

If coroutines aren't familiar territory, David Beazley's A Curious Course on Coroutines and Concurrency is a stellar introduction.

Hank Gay
Wow! Thanks---I've started reading the Beazley reference, and it's going to take most of the evening, but it has already been worth it.
Pierce
It's phenomenal. I come from a heavily (OK, entirely) imperative background, and it completely blew my mind the first time I read it.
Hank Gay
Interesting - thank you! Probably overkill for my specific case, but very cool nonetheless.
Vicky
A: 

I suggest checking out chapter 4 of Text Processing in Python by David Mertz. He implements a state machine class in Python that is very elegant.

Brandon Corfman
+4  A: 

You could give your states constant names instead of using 0, 1, 2, etc. for improved readability.

You could use a dictionary to map (current_state, input) -> (next_state), but that doesn't really let you do any additional processing during the transitions. Unless you include some "transition function" too to do extra processing.

Or you could do a non-FSM approach. I think this will work as long as 0xAA 0xAA only appears when it indicates a "start" (doesn't appear in data).

with open(sys.argv[1], 'rb') as f:
    contents = f.read()
    for chunk in contents.split('\xaa\xaa')[1:]:
        length = ord(chunk[0])
        data = chunk[10:10+length]
        print data

If it does appear in data, you can instead use string.find('\xaa\xaa', start) to scan through the string, setting the start argument to begin looking where the last data block ended. Repeat until it returns -1.

FogleBird
Brilliant - thank you! I knew there'd be some way like this but couldn't quite get my head round it.
Vicky
Oh - 0xAA 0xAA does not appear in valid data. It might conceivably appear in the junk between chunks of valid data, in which case neither your solution nor mine (nor any other solution AFAICT) can handle it.
Vicky
A: 

I think the most pythonic way would by like what FogleBird suggested, but mapping from (current state, input) to a function which would handle the processing and transition.

pbh101
A: 

I am a little apprehensive about telling anyone what's Pythonic, but here goes. First, keep in mind that in python functions are just objects. Transitions can be defined with a dictionary that has the (input, current_state) as the key and the tuple (next_state, action) as the value. Action is just a function that does whatever is necessary to transition from the current state to the next state.

There's a nice looking example of doing this at http://code.activestate.com/recipes/146262-finite-state-machine-fsm. I haven't used it, but from a quick read it seems like it covers everything.

A similar question was asked/answered here a couple of months ago: http://stackoverflow.com/questions/2101961/python-state-machine-design. You might find looking at those responses useful as well.

Pierce
A: 

You can use regexps. Something like this code will find the first block of data. Then it's just a case of starting the next search from after the previous match.

find_header = re.compile('\xaa\xaa(.).{9}', re.DOTALL)
m = find_header.search(input_text)
if m:
    length = chr(find_header.group(1))
    data = input_text[m.end():m.end() + length]
Paul Hankin