views:

8812

answers:

4

In Python, how do I read a binary file and loop over each byte of that file?

+11  A: 
f = open("myfile", "rb")
try:
    byte = f.read(1)
    while byte != "":
        # Do stuff with byte.
        byte = f.read(1)
finally:
    f.close()

By suggestion of chrispy:

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte != "":
        # Do stuff with byte.
        byte = f.read(1)

Note that the with statement is not available in versions of Python below 2.5. To use it in v 2.5 you'll need to import it:

from __future__ import with_statement

In 2.6 this is not needed.

Skurmedel
EOF? In Python? I think you meant "while len(byte) > 0:" or similar.
RichieHindle
Yes I'm a bit damaged as I've been sitting in C for a while. :)But yes, that is one way to do it :). read will return an empty string when there is no more input.
Skurmedel
The with statement would tidy up this code.
chrispy
It would. I don't know what version he's using though. I can include an example for clarity. Thanks for the suggestion.
Skurmedel
Just a couple of nit-pickish Python style things: it's common (and PEP8 style) to use the fact that empty strings evaluate to false, so just "while byte: ..." would do. It's also common to use the "while True" idiom in Python so you don't have to repeat the f.read(1). Like "while True: byte = f.read(1); if not byte: break ...".
benhoyt
benhoyt: Thanks for the suggestions, I suspected you could do something like that. Do you want me to add an example using that style too? Then the old example would sort of explain how the new one works.
Skurmedel
@benhoyt but then it'll quit when you read a zero
John Montgomery
Don't understand the sudden downvotes on this. Did I offend somebody?
Skurmedel
@John Montgomery, "it'll quite when you read a zero": no it won't. You're reading characters, not integers, and no character value from '\x00' to '\xff' is ever False in Python. Only no character, as in '', will be False, and you'll get that only after exhausting your input.
Peter Hansen
+6  A: 

If the file is not too big that holding it in memory is a problem:

bytes_read = open("filename", "rb").read()
for b in bytes_read:
    process_byte(b)

where process_byte represents some operation you want to perform on the passed-in byte.

If you want to process a chunk at a time:

file = open("filename", "rb")
try:
    bytes_read = file.read(CHUNKSIZE)
    while bytes_read:
        for b in bytes_read:
            process_byte(b)
        bytes_read = file.read(CHUNKSIZE)
finally:
    file.close()
Vinay Sajip
+1  A: 

If you wanted to take a more functional approach, you could do something like...

# apply a function to each byte individually
def bytefunc(byte):
    # do something here
    pass

map (bytefunc, open("file", "rb").read())

# or...

# accumulate something over the whole file
def addbytes(thisbyte, nextbyte):
    return thisbyte + nextbyte # or something else

reduce (addbytes, open("file", "rb").read(), 0)

But these approaches are also susceptible to running out of memory like Vinay mentioned.

Mark Rushakoff
+5  A: 

This generator yields bytes from a file, reading the file in chunks:

def bytes_from_file(filename, chunksize=8192):
    with open(filename, "rb") as f:
        while True:
            chunk = f.read(chunksize)
            if chunk:
                for b in chunk:
                    yield b
            else:
                break

# example:
for b in bytes_from_file('filename'):
    do_stuff_with(b)
codeape
+1. This is a useful solution if this is something you commonly do I guess. I would probably change it so that bytes_from_file took a file-like object though, so I could use it with all kinds of "streams".
Skurmedel