views:

76

answers:

2

I'm trying to read a binary file (which represents a matrix in Matlab) in Python. But I am having trouble reading the file and converting the bytes to the correct values.

The binary file consists of a sequence of 4-byte numbers. The first two numbers are the number of rows and columns respectively. My friend gave me a Matlab function he wrote that does this using fwrite. I would like to do something like this:

f = open(filename, 'rb')
rows = f.read(4)
cols = f.read(4)
m = [[0 for c in cols] for r in rows]
r = c = 0
while True:
    if c == cols:
        r += 1
        c = 0
    num = f.read(4)
    if num:
        m[r][c] = num
        c += 1
    else:
        break

But whenever I use f.read(4), I get something like '\x00\x00\x00\x04' (this specific example should represent a 4), and I can't figure out convert it into the correct number (using int, hex or anything like that doesn't work). I stumbled upon struct.unpack, but that didn't seem to help very much.

Here is an example matrix and the corresponding binary file (as it appears when I read the entire file using the python function f.read() without any size paramater) that the Matlab function created for it:

4     4     2     4
2     2     2     1
3     3     2     4
2     2     6     2

'\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00'

So the first 4 bytes and the 5th-8th bytes should both be 4, as the matrix is 4x4. and then it should be 4,4,2,4,2,2,2,1,etc...

Thanks guys!

+7  A: 
rows = f.read(4)
cols = f.read(4)

both names are now bound to 4-byte strings. To turn them into integers instead,

import struct

rowsandcols = f.read(8)
rows, cols = struct.unpack('=ii', rowsandcols)

See the docs for struct.unpack.

Alex Martelli
It didn't work for me =/>>> import struct>>> f = open('Z:\summer reu 2010\m.dat','rb')>>> rowsandcols = f.read(8)>>> rows, cols = struct.unpack('=ii',rowsandcols)>>> rows67108864>>> cols67108864rows and cols should both be 4
Daniel Waltrip
gahh i can't format my comment. Here is a screenshot: http://i47.tinypic.com/14ub18n.jpg
Daniel Waltrip
Considering the data is described as being big-endian and that most popular CPUs today are little-endian, perhaps it should be `!` or `>` instead of `=` ?
Nas Banov
yes that worked Nas. Can someone please explain what all these different formats actually mean? What is big-endian/small-endian and native/standard?
Daniel Waltrip
I looked "endianness" up on wikipedia, sorry to bother you all. Thank you very much for the help! =)
Daniel Waltrip
+2  A: 

I looked a bit more in your problem, since I had never used struct before so it was good learning activity. Turns out there are couple of twists there - first the numbers are not stored as 4-byte integers but as 4-byte float in big-endian form. Second, if your example is correct, then the matrix was not stored as one would expect - by rows, but by columns instead. E.g. it was output like so (pseudocode):

for j in cols:
  for i in rows:
    write Aij to file

So I had to transpose the result after reading. Here is the code that you need given the example:

import struct 

def readMatrix(f):
    rows, cols = struct.unpack('>ii',f.read(8))
    m = [ list(struct.unpack('>%df' % rows, f.read(4*rows)))
             for c in range(cols)
        ]
    # transpose result to return
    return zip(*m)

And here we test it:

>>> from StringIO import StringIO
>>> f = StringIO('\x00\x00\x00\x04\x00\x00\x00\x04@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x80\x00\x00@\x00\x00\x00@@\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\x00\x00\x00@\xc0\x00\x00@\x80\x00\x00?\x80\x00\x00@\x80\x00\x00@\x00\x00\x00')
>>> mat = readMatrix(f)
>>> for row in mat:
...     print row
...     
(4.0, 4.0, 2.0, 4.0)
(2.0, 2.0, 2.0, 1.0)
(3.0, 3.0, 2.0, 4.0)
(2.0, 2.0, 6.0, 2.0)
Nas Banov
Your answer was better, my apologies. However, I don't know if it was just my machine, but I had to use "!" instead of ">" for struct.unpack
Daniel Waltrip
@Daniel: hm, that's weird if '!' and '>' give you different result, seems to me they should be the same. The documentation says `The form "!" [network order = big-endian] is available for those poor souls who claim they can't remember whether network byte order is big-endian [">"] or little-endian ["<"]`. But if it works, don't touch it - it ain't broken :)
Nas Banov