views:

475

answers:

4

My work requires that I perform a mathematical simulation whose parameters come from a binary file. The simulator can read such binary file without a problem.

However, I need to peek inside the binary file to make sure the parameters are what I need them to be, and I cannot seem to be able to do it.

I would like to write an script in Python which would allow me to read in the binary file, search for the parameters that I care about, and display what their values are.

What I know about the binary file:

It represents simple text (as opposed to an image or soud file). There is a piece of code that can "dump" the file into a readable format: if I open that dump in Emacs I will find things like:

CENTRAL_BODY = 'SUN'

All the file is just a series of similar instructions. I could use that dump code, but I much rather have Python do that.

This seems to be a very trivial question, and I apologize for not knowing better. I thought I was a proficient programmer!

Many thanks.

+4  A: 

You can read the file's content into a string in memory:

thedata = open(thefilename, 'rb').read()

and then locate a string in it:

where = thedata.find('CENTRAL_BODY')

and finally slice off the part you care about:

thepart = thedata[where:where+50]  # or whatever length

and display it as you prefer (e.g. find the string value by locating within thepart an = sign, then the first following quote, then the next quote after that).

Alex Martelli
Thank you. This approach allows me to find the specific word (parameter name) but it seems that the order in the binary file and the text dump are completely different. When I slice the part, it should be, say, 50 characters from the beginning. I do that, and I get the other "parameter names". I have this in the text file:GM = 123.456AU = 567.890I open the binary file, look for 'GM' and slice from x to x+12. What I get is (python dump):GM AU ... [bunch of weird characters]So it does find the strings, but the order is different.I guess the format is more complicated than I though.
Arrieta
@arrieta, I went w/your description of course -- maybe the numbers are in binary format too (not text), etc, etc.
Alex Martelli
What if "CENTRAL_BODY" is just assumed, and the value he has to look at is "SUN"? Or, what if it's scrambled somehow? (e.g. rot13)
hasen j
If you use thedata.__repr__() to do you searches instead of thedata, you won't see weird characters. I use that trick for file format fuzzing so I can get fuzzy matches of byte sequences with regular expressions.
twneale
A: 

If it's a binary file, you will need to use the struct module. You will need to know how the data is formatted in the file. If that is not documented, you will have to reverse engineer it.

Do you have source code of the other dumping program? You may be able to just port that to Python

We can probably help you better if we can see what the binary file and the corresponding dump looks like

gnibbler
I asked around and two things are now clear: (1) the 'dump' code is not available, only the executable, and (2) if I want to do this in Python I need to reverse engineer the binary format. What a pain! Thanks for your answer!
Arrieta
+1  A: 

It sounds like this 'dump' program already does what you need: interpreting the binary file. I guess my approach would be to write a python program that can take a dump'ed file, extract the parameters you want and display them.

Then parse it with something like this:

myparms.py:

import sys

d = {}
for line in sys.stdin:
    parts = line.split("=",2)
    if len(parts) < 2:
        continue
    k = parts[0].strip()
    v = parts[1].strip()
    d[k] = v

print d['CENTRAL_BODY']

Use this like:

dump parameters.bin | python myparms.py

You didn't mention a platform or provide details about the dump'ed format, but this should be a place to start.

Mark Peters
A: 

You have to know the format the data is stored in; there's simply no way around that.

If there's no written spec on it, try to open it in a hex editor and study the format, using the text-dump as a reference. If you can get the source code for the tool that creates the text-dumps, that would help you alot.

Keep in mind that the data could be scrambled in someway or another, e.g. rot13.

hasen j