views:

165

answers:

2

How do I go about opening a binary data file in Python and reading back the values one long at a time, into a struct. I have something like this at the moment but I think this will keep overwriting idList, I want to append to it, so I end up with a tuple of all the long values in the file -

    file = open(filename, "rb")

        try:
            bytes_read = file.read(struct.calcsize("=l"))
            while bytes_read:

                # Read 4 bytes(long integer)
                idList = struct.unpack("=l", bytes_read)

                bytes_read = file.read(struct.calcsize("=l"))
        finally:
            file.close()

Thanks.

+4  A: 

Simplest (python 2.6 or better):

import array
idlist = array.array('l')
with open(filename, "rb") as f:
  while True:
    try: idlist.fromfile(f, 2000)
    except EOFError: break
idtuple = tuple(idlist)

Tuples are immutable, so they can't be built incrementally: so you have to build a different (mutable) sequence, then call tuple on it at the end. If you don't actually need specifically a tuple, of course, you can save the last, costly step and keep the array or list or whatever. Avoiding trampling over built-in names like file is advisable anyway;-).

If you have to use the struct module for a job that's best handled by the array module (e.g., because of a bet),

idList = [ ]
with open(filename, "rb") as f:
    while True:
        bytes_read = file.read(struct.calcsize("=l"))
        if not bytes_read: break
            oneid = struct.unpack("=l", bytes_read)[0]
            idList.append(oneid)

The with statement (also available in 2.5 with an import form the future) is better than the old try/finally in clarity and conciseness.

Alex Martelli
Thanks. Unfortunately we're limited to using Python 2.5 at the moment, how would this differ in that?
Adam Cobb
@Adam, just add `from __future__ import with_statements` at the start of the module.
Alex Martelli
In the array example you call fromfile with a value of 2000, should that not be 4, for the four byte integer? Or am I misunderstanding this function?
Adam Cobb
@Adam, `.fromfile(f, N)` reads up to `N` items (raising `EOFError` if it's read less than `N` due to end-of-file, but you just need to catch that). The `array` instance already knows that each item takes 4 bytes because it knows it's an array of `l`s, i.e., 4-byte signed ints. Reading a few thousand items at a time (exact number doesn't matter, 2000 rather than 1000 or 3000 is just because you do have to pick an exact number;-) is more efficient than reading one at a time (not a huge difference in performance, but, "waste not, want not";-).
Alex Martelli
Got ya, that makes sense, thanks :)
Adam Cobb
A: 

Change

idList = struct.unpack("=l", bytes_read)

to

idList.append(struct.unpack("=l", bytes_read)[0])
Sijin