views:

44

answers:

2

Hi there,

I've been looking on certain sites for some time now, but I cant seem to find anything usable about file formats.

There is a certain file format on my computer, which I want to re-create to make add-ons for a program. Unfortunatly I would be the first to do so for that certain format, which makes it all the more hard. There are programs to ádd information to the file, but those programs are not open-source unfortunatly. But that does mean it's possible to figure out the file format somehow.

The closest I came to finding usable information about re-creating a file format was, "open it in notepad or a hex editor, and see if you can find anything usable"..

This certain file format contains information, so nothing like music files or images in case you'r wondering.

I'm just wondering if there is any guide on how to create a file format, or figuring out how an existing file format works. I believe this sort of format is called a Tabulated data format?

Thank you for your time.

(I realise that was an awefull lot of text for just one question..)

+2  A: 

It really does depend on the file format.

Ideally, you find some documentation on how the file works, and use that. This is easy if the file uses a public format, so for HTML files or PNG files you can easily find that information. Proprietary formats often have published spec's too, or at least a publicly available API for manipulating them, depending on the company's policy on actively encouraging this sort of extension.

Next best is using examples of working code (whether published source or reverse engineered in itself) that deal with the file as a reference implementation.

Otherwise, reverse engineering is as good as you can do. Opening it in notepad and a hex editor (even with a binary format, looking at it parsed as text can tell you something; even with a text-based format, looking at it in a hex editor can tell you if they are making use of non-printable characters) is indeed the way to go. It's a detective job and while sometimes easy, often very hard, esp. since you may miss ways they deal with edge-cases that aren't hit in the samples you use.

Jon Hanna
Alright, thanks alot for your reponse. I tried opening the file with notepad and a hex editor, but I couldn't find anything I could use. I just saw the information I would normally see with the editor tool made by the reverse engineers, so no weird characters, except for 00 which displays as [], but I guess that's sort of normal in a hex editor. I guess my best shot would be to ask for help on the forum where they distributed the tools. Thanks again!
Nick
00 would be the "null char" and likely used as some sort of separator precisely because it doesn't appear in most text, or perhaps because the program using it is in C or a similar language that terminates its strings with it, so just writing it to the file including the terminator allows it to be read back again without having the encode the length in the file.
Jon Hanna
A: 

The difficulty with obscure formats distributed with games is that they are often compiled from either a declarative definition language, a scripting language or directly from a set of resources like textures and meshes.

In some games, one compiled file will contain bits and pieces of all of the above, with no available documentation on the tools and formats used to piece it together. Some people call that "fun".

If you can't get anything from the hex, can't find any documentation and can't find a tool to read the file, you're probably best off asking the community to see if anyone is familiar with the technology.

IanGilham