views:

132

answers:

1

This is to extend the question: Tools to help reverse engineer binary file formats

Are there any tools that are publicly available that uses clustering and/or data mining techniques to reverse engineer file formats?

For example, with the tool you would have a collection of files that have the same format and the output of the tool would be the generic structure?

+3  A: 

If one had a truly efficient binary encoding format (ZIP files are an example), then the information content in each bit is high. Essentially, it will look like a perfect random number.

You can't infer anything from that without additional knowledge.

If the binary encoding isn't efficient, in theory, you have some faint chance of seeing structure. But this still sounds really hard; how do you even begin guessing where the boundaries of fields are?

The AI machine learning types will tell you, you can't learn anything unless you already "almost" know it. Often they succeed by encoding the the problem with problem-tokens that at least you can reason about.

I don't think you can do this without providing more information. Do you know anything about the file formats? Field sizes are always less than N bits? Only ASCII strings are encoded or vice versa?

Ira Baxter
I was just wanting to find out if there were tools that were already out there that make general guessimations of what the format is... i don't need this to solve a problem its just out of curiosity
monksy
Steven was pointing out why there are not. It's not a general problem.
Simeon Pilgrim