views:

2371

answers:

4

What tools are available to aid in decoding unknown binary data formats?

I know Hex Workshop and 010 Editor both support structures. These are okay to a limited extent for a known fixed format but get difficult to use with anything more complicated, especially for unknown formats. I guess I'm looking at a module for a scripting language or a scriptable GUI tool.

For example, I'd like to be able to find a structure within a block of data from limited known information, perhaps a magic number. Once I've found a structure, then follow known length and offset words to find other structures. Then repeat this recursively and iteratively where it makes sense.

In my dreams, perhaps even automatically identify possible offsets and lengths based on what I've already told the system!

+5  A: 

My own tool "iBored", which I released just recently, can do parts of this. I wrote the tool to visualize and debug file system formats (UDF, HFS, ISO9660, FAT etc.), and implemented search, copy and later even structure and templates support. The structure support is pretty straight-forward, and the templates are a way to identify structures dynamically.

The entire thing is programmable in a Visual BASIC dialect, allowing you to test values, read specific blocks, and all.

The tool is free, works on all platforms (Win, Mac, Linux), but as it's personal tool which I just released to the public to share it, it's not much documented.

However, if you want to give it a try, and like to give feedback, I might add more useful features.

I'd even open source it, but as it's written in REALbasic, I doubt many people will join such a project.

Link: iBored home page

Thomas Tempelmann
Sounds like a hell of a nice project to join... When I was working as an antivirus researcher, this would have been really handy. Instead, I did mine all by hand... I'm gonna download it, and check it out. Thanks you for this, I have a use for it. :)
LarryF
+1  A: 

Here are some tips that come to mind:

From my experience, interactive scripting languages (I use Python) can be a great help. You can write a simple framework to deal with binary streams and some simple algorithms. Then you can write scripts that will take your binary and check various things. For example:

Do some statistical analysis on various parts. Random data, for example, will tell you that this part is probably compressed/encrypted. Zeros may mean padding between parts. Scattered zeros may mean integer values or Unicode strings and so on. Try to spot various offsets. Try to convert parts of the binary into 2 or 4 byte integers or into floats, print them and see if they make sence. Write some functions that will search for repeating or very similar parts in the data, this way you can easily spot headers.

Try to find as many strings as possible, try different encodings (c strings, pascal strings, utf8/16, etc.). There are some good tools for that (I think that Hex Workshop has such a tool). Strings can tell you a lot.

Good luck!

Untrots
+2  A: 

The AXE editor has a few features that might be handy for this that I haven't seen in other hex editors.

Apart from structures, which Hex Workshop probably does better, it also has scripts, a regularity finder and a grammar generator. Of these the only one I've used is the regularity finder, which sets the line length to based on any structure it could find. The other two sound like they could be useful, but I haven't tried them myself.

The most interesting feature, however, and the one I've had the most use for is its graphical view mode. That basically just shows you the file with each byte turned into a color-coded pixel. And as simple as that sounds, it has made my reverse-engineering attempts a lot easier at times.

I suppose doing it by eye is quite the opposite of doing automatic analysis, though, and the graphical mode won't be much use for finding and following offsets...

mercator
+1  A: 

Tupni; to my knowledge not directly available out of Microsoft Research, but there is a paper about this tool which can be of interest to someone wanting to write a similar program (perhaps open source):

Tupni: Automatic Reverse Engineering of Input Formats (@ ACM digital library)

Abstract

Recent work has established the importance of automatic reverse engineering of protocol or file format specifications. However, the formats reverse engineered by previous tools have missed important information that is critical for security applications. In this paper, we present Tupni, a tool that can reverse engineer an input format with a rich set of information, including record sequences, record types, and input constraints. Tupni can generalize the format specification over multiple inputs. We have implemented a prototype of Tupni and evaluated it on 10 different formats: five file formats (WMF, BMP, JPG, PNG and TIF) and five network protocols (DNS, RPC, TFTP, HTTP and FTP). Tupni identified all record sequences in the test inputs. We also show that, by aggregating over multiple WMF files, Tupni can derive a more complete format specification for WMF. Furthermore, we demonstrate the utility of Tupni by using the rich information it provides for zeroday vulnerability signature generation, which was not possible with previous reverse engineering tools.

MaD70