tags:

views:

152

answers:

2

I am writing a python script which looks at common computer files and examines them for similar bytes, words, double word's. Though I need/want to see the files in Hex, ande cannot really seem to get python to open a simple file in python. I have tried codecs.open with hex as the encoding, but when I operate on the file descriptor it always spits back

      File "main.py", line 41, in <module>
    main()
  File "main.py", line 38, in main
    process_file(sys.argv[1])
  File "main.py", line 27, in process_file
    seeker(line.rstrip("\n"))
  File "main.py", line 15, in seeker
    for unit in f.read(2):
  File "/usr/lib/python2.6/codecs.py", line 666, in read
    return self.reader.read(size)
  File "/usr/lib/python2.6/codecs.py", line 472, in read
    newchars, decodedbytes = self.decode(data, self.errors)
  File "/usr/lib/python2.6/encodings/hex_codec.py", line 50, in decode
    return hex_decode(input,errors)
  File "/usr/lib/python2.6/encodings/hex_codec.py", line 42, in hex_decode
    output = binascii.a2b_hex(input)
TypeError: Non-hexadecimal digit found





def seeker(_file):
 f = codecs.open(_file, "rb", "hex")
 for LINE in f.read():
      print LINE
 f.close()

I really just want to see files, and operate on them as if it was in a hex editor like xxd. Also is it possible to read a file in increments of maybe a word at a time.

No this is not homework.

+3  A: 

codecs.open(_file, "rb", "hex") is trying to decode the file's contents as being hex, which is why it's failing on you.

Considering your other "word at a time" target (I assume you mean "computer word", i.e. 32 bits?), you'll be better off encapsulating the open file into a class of your own. E.g.:

class HexFile(object):
    def __init__(self, fp, wordsize=4):
        self.fp = fp
        self.ws = wordsize
    def __iter__(self):
        while True:
            data = self.fp.read(self.ws)
            if not data: break
            yield data.encode('hex')

plus whatever other utility methods you'd find helpful, of course.

Alex Martelli
Thank you Alex, though I must say I have never seen this method before of file representation. I take it HexFile takes a descriptor? maybe you can explain a bit more if you have time, I am quite interested.
Recursion
Alex is just creating a wrapper class that wraps around a file descriptor object and allows you to iterate over it in word-sized blocks.
Amber
@Recursion, you just code `f=HexFile(open('data','rb'))` and then `for hexword in f: ...`. I'm not sure what you mean by "file representation" -- as Amber says this is just a utility wrapper.
Alex Martelli
Thank you Alex, can you suggest a place where I can learn more about this.
Recursion
@Recursion, learn more about what? If Python programming in general, that's what I've tried to do in my two books, Python Cookbook and Python in a Nutshell.
Alex Martelli
+1  A: 

You can read a set number of bytes by passing an integer argument to read:

32bits = file.read(4)

You can seek to a position in the file using seek:

file.seek(100) # Seeks to byte 100
Amber