tags:

views:

406

answers:

5

I have a file where the first byte contains encoded information. In Matlab I can read the byte bit by bit with var=fread(file,8, 'ubit1') then retrieve each bit by var(1),var(2), etc.

Is there any equivalent bit reader in python?

+4  A: 

The smallest unit you'll be able to work with is a byte. To work at the bit level you need to use bitwise operators.

x = 3
#Check if the 1st bit is set:
x&1 != 0
#Returns True

#Check if the 2nd bit is set:
x&2 != 0
#Returns True

#Check if the 3rd bit is set:
x&4 != 0
#Returns False
Brian R. Bondy
Do you mind adding more info, since the OP clearly seems like a beginner?
Bruno Brant
Sure I'm coming from a matlab background and can't find a 'ubit1' typecode for python.I've used the following:f=open('filename','rb')var=f.read(1)which returns var as the hex value string '\x04'how do i get the binary representation of the string?
David
@David: I see, already covered by accepted answer.
Brian R. Bondy
+3  A: 

The Python wiki explains how.

Ignacio Vazquez-Abrams
Read it but couldn't figure out how to do this.I'm coming from a matlab background and can't find a 'ubit1' typecode for python.I've used the following:f=open('filename','rb')var=f.read(1)which returns var as the hex value string '\x04'how do i get the binary representation of the string?
David
`struct` or `int()` can convert it into a number for you. Then use the various methods given in the wiki.
Ignacio Vazquez-Abrams
thanks for your help
David
i believe the bitarray module would do what i need but i get compile errors trying to install it on all my systems
David
Just discovered the bitstring module which does the trick and more.http://code.google.com/p/python-bitstring/
David
+2  A: 

You won't be able to read each bit one by one - you have to read it byte by byte. You can easily extract the bits out, though:

f = open("myfile", 'rb')
# read one byte
byte = f.read(1)
# convert the byte to an integer representation
byte = ord(byte)
# now convert to string of 1s and 0s
byte = bin(byte)[2:].rjust(8, '0')
# now byte contains a string with 0s and 1s
for bit in byte:
    print bit
Daniel G
Tried it and for the example where byte='\0x04'the code above returns '0b'
David
@David OOPS! Let me fix that. Sorry.... (edit) Ok, it's fixed now. Should work.
Daniel G
Thanks your code now gives byte=100 which is the correct base 2 representation of ord('\0x04')=4 but shouldn't the byte read be '00000100'
David
Sure, I'll add that really quickly (the problem is that it truncates leading zeros).
Daniel G
I realize I can pad the bits to get the representation once i have the binary value but it just seems odd that I can't read the bits directly.
David
This is a limitation of Python - it reads the entire byte at once. Then, when you call bin(), it gives you the smallest possible representation of that number in binary (that is, with the fewest possible bits, rather than using any standard like 8 or 32 bits). If you want all eight bits of each byte, you need to pad it again after calling bin().
Daniel G
Daniel, I appreciate your help. Thanks for the quick and straight forward solution
David
A: 

There are two possible ways to return the i-th bit of a byte. The "first bit" could refer to the high-order bit or it could refer to the lower order bit.

Here is a function that takes a string and index as parameters and returns the value of the bit at that location. As written, it treats the low-order bit as the first bit. If you want the high order bit first, just uncomment the indicated line.

def bit_from_string(string, index):
       i, j = divmod(index, 8)

       # Uncomment this if you want the high-order bit first
       # j = 8 - j

       if ord(string[i]) & (1 << j):
              return 1
       else:
              return 0

The indexing starts at 0. If you want the indexing to start at 1, you can adjust index in the function before calling divmod.

Example usage:

>>> for i in range(8):
>>>       print i, bit_from_string('\x04', i)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0

Now, for how it works:

A string is composed of 8-bit bytes, so first we use divmod() to break the index into to parts:

  • i: the index of the correct byte within the string
  • j: the index of the correct bit within that byte

We use the ord() function to convert the character at string[i] into an integer type. Then, (1 << j) computes the value of the j-th bit by left-shifting 1 by j. Finally, we use bitwise-and to test if that bit is set. If so return 1, otherwise return 0.

Daniel Stutzbach
Got it! thanks for the detail in your comment I looked at the bit shift operators but couldn't see how it worked for this. Your answer helps clarify the bitwise operators and the approach. Thanks
David
Daniel thanks for the general solution above.
David
+1  A: 

Read the bits from a file, low bits first.

def bits(f):
    bytes = (ord(b) for b in f.read())
    for b in bytes:
        for i in xrange(8):
            yield (b >> i) & 1

for b in bits(open('binary-file.bin', 'r')):
    print b
Paul Hankin
Tested this (btw the byte is little endian) and ord('\x04') returns 4 which should return the bit string '0000100' using your code i get '000100000'
David
oops i meant i get '00100000' with your code
David
It gives low bits first (which is natural, since it also gives low bytes first). But if you want the other order, you can change `xrange(8)` to `reversed(xrange(8))`.
Paul Hankin
Tested against the matlab code reading the file and your code correctly returns the same bit string from the data file. the byte converted to a bit string is '00100000' not sure why the conversion in Daniel G's answer is off since it makes sense.
David