views:

20

answers:

2

How does one develop a software to read a proprietary file type without having that proprietary software. Something like what the open office folks did with MS Word (.doc) files. Open Office can read .doc files.

That might be easy if the proprietary software has an open source SDK to it, for example Adobe has the Flex open source SDK so it's possible to create flash (.swf) files without having Adobe Flash. But in the case of MS Word, which I believe had no open source SDK, how did the open office guys get it to read it.

Of course I'm using open office just as an example, but my question is general, how could one read a proprietary output file? What's the idea here? I know someone will say reverse engineering, but I don't think reverse engineering the entire software makes sense here (not that I know anything about that field yet) because the goal is not to create software that has the same functionalities. Is there a way to work with the output file only?

Any thoughts on this?

A: 

If you are lucky, at least some information on the file for example MS does has information on the doc file.

Other wise it is lot of work. basically you make a simple document save it, then make a small change, save it and compare the two. Eventually you can figure out the format.

Jim C
+1  A: 

It's an iterative process:

  • Inspect the stream of raw bytes in the file and make a guess as to what they mean
  • Write code to verify the guess
  • See what goes wrong when you try to load the file
  • Repeat

You'll need a wide variety of test files, a lot of patience and large dollops of insight.

My experience is that it's pretty easy to handle the basics, but that complex file format features can be a pain to handle.

Bevan