tags:

views:

128

answers:

6

I am opening up a binary file like so:

file = open("test/test.x", 'rb')

and reading in lines to a list. Each line looks a little like:

'\xbe\x00\xc8d\xf8d\x08\xe4.\x07~\x03\x9e\x07\xbe\x03\xde\x07\xfe\n'

I am having a hard time manipulating this data. If I try and print each line, python freezes, and emits beeping noises (I think there's a binary beep code in there somewhere). How do I go about using this data safely? How can I convert each hex number to decimal?

+1  A: 

You are trying to print the data converted to ASCII characters, which will not work.

You can safely use any byte of the data. If you want to print it as a hexadecimal, look at the functions ord and hex/

Yann Ramin
+5  A: 

To print it, you can do something like this:

print repr(data)

For the whole thing as hex:

print data.encode('hex')

For the decimal value of each byte:

print ' '.join([str(ord(a)) for a in data])

To unpack binary integers, etc. from the data as if it originally came from a C-style struct, look at the struct module.

Forest
Thank you! This is what I was looking for!
Dominic Bou-Samra
+2  A: 

Like theatrus mentioned, ord and hex might help you. If you want to try to interpret some sort of structured binary data in the file, the struct module might be helpful.

Mattias Nilsson
+1 for struct. Right way to go to interpret packed binary data.
Noufal Ibrahim
+2  A: 

\xhh is the character with hex value hh. Other characters such as . and `~' are normal characters.

Iterating on a string gives you the characters in it, one at a time.

ord(c) will return an integer representing the character. E.g., ord('A') == 65.

This will print the decimal numbers for each character:

s = '\xbe\x00\xc8d\xf8d\x08\xe4.\x07~\x03\x9e\x07\xbe\x03\xde\x07\xfe\n'
print ' '.join(str(ord(c)) for c in s)
Jon-Eric
Note that \x07 is that ASCII BEL character. That's what's causing the beeping.
dan04
+1  A: 

Are you using read() or readline()? You should be using read(n) to read n bytes; readline() will read until it hits a newline, which the binary file might not have.

In either case, though, you are returned a string of bytes, which may be printable or non-printable characters, and is probably not very useful.

What you want is ord(), which converts a one-byte string into the corresponding integer value. read() from the file one byte at a time and call ord() on the result, or iterate through the entire string.

Paul Richter
+1  A: 

Binary data is rarely divided into "lines" separated by '\n'. If it is, it will have an implicit or explicit escape mechanism to distinguish between '\n' as a line terminator and '\n' as part of the data. Reading such a file as lines blindly without knowledge of the escape mechanism is pointless.

To answer your specific concerns:

'\x07' is the ASCII BEL character, which was originally for ringing the bell on a teletype machine.

You can get the integer value of a byte 'b' by doing ord(b).

HOWEVER, to process binary data properly, you need to know what the layout is. You can have signed and unsigned integers (of sizes 1, 2, 4, 8 bytes), floating point numbers, decimal numbers of varying lengths, fixed length strings, variable length strings, etc etc. Added complication comes from whether the data is recorded in bigendian fashion or littleendian fashion. Once you know all of the above (or have very good informed guesses), the Python struct module should be able to be used for all or most of your processing; the ctypes module may also be useful.

Does the data format have a name? If so, tell us; we may be able to point you to code or docs.

You ask "How do I go about using this data safely?" which begs the question: What do you want to use it for? What manipulations do you want to do?

John Machin