views:

1377

answers:

4

I want to be able to open up an image file and extra the hexadecimal values byte-by-byte. I have no idea how to do this and googling "python byte editing" and "python byte array" didn't come up with anything, surprisingly. Can someone point me towards the library i need to use, specific methods i can google, or tutorials/guides?

+2  A: 

The Hachoir framework is a set of Python library and tools to parse and edit binary files:

http://pypi.python.org/pypi/hachoir-core

It has knowledge of common file types, so this could just be what you need.

Coox
+6  A: 

Python standard library has mmap module, which can be used to do exactly this. Take a look on the documentation for further information.

af
+1. Normally I'd load the file into memory to edit as in sth's answer, but if the file may be very long, mmap is better. Of course if the file is very *very* long and won't fit in your address space, it's back to open(path, 'r+b') and seek()...
bobince
@bobince: at what point in your opinion would a file be "too long" to go with sth's answer and to move onto mmap?
hatorade
@hatorade: Standard open/read/close can handle files as large as available memory, but you'll see performance improvements by using mmap() because only the pages you modify will be read from disk. I'd estimate significant performance differences will be apparent when the file hits a megabyte or so.
John Millikin
+6  A: 

Depending on what you want to do it might be enough to open the file in binary mode and read the data with the normal file functions:

// load it
f = open("somefile", 'rb')
data = f.read()
f.close()

// do something with data
data.reverse()

// save it
f = open("somefile.new", 'wb')
f.write(data)
f.close()

Python doesn't really care if the data string contains "binary" or "text" data. If you just want to do simple modifications to a file of reasonable size this is probably good enough.

sth
yeah i just want to open up a custom image file and convert it to .tiff. this might be the trick since i'm basically "undoing" the algorithm used to assign the pixel data in the custom image file and reorganizing it per .tif specifications
hatorade
+1  A: 

Check out the stuct module.

This module performs conversions between Python values and C structs represented as Python strings. It uses format strings (explained below) as compact descriptions of the lay-out of the C structs and the intended conversion to/from Python values. This can be used in handling binary data stored in files or from network connections, among other sources.

Matthew Marshall