views:

129

answers:

3

I am trying to deserialize an old file format that was serialized in Delphi, it uses binary seralization. I know nothing about the structure of the file except some very high level records that are in it.

What steps would you take to solve this problem? Any tools etc?

+3  A: 

A good hexeditor, and use the gray matter to identify structures.

If you get a hint what kind of file it is, you can search for more specialized tools.

Running the unix/Linux "file" command can be good too (*) See Barry's comment below for how it works. It can be a quick check for common filetypes like DBF,ZIP etc hidden by using a different extension.

(*) there are 3rd party builds for windows, but they might lag in versions. If you can do it on a recent *nix distro, it is advised to do so.

Marco van de Voort
What does the "file" command do exactly?
kyndigs
@kyndigs it uses a list of format descriptions - normally byte sequences that different file formats are expected to start with - to try and identify the file format. The format descriptions are called magic, and you'll find them in /etc/magic or /usr/share/file/magic or similar.
Barry Kelly
+2  A: 

The serialization process simply loops over all published properties and streams their value to a text file. If you do not know the exact classes that were streamed to the file you will have a very hard time deserializing the file. (if not impossible)

birger
Assuming it was really serialized using VCL classes. VCL (tbinarywriter) streams IIRC contain property names and rough type though.
Marco van de Voort
I decompiled the original exe it was opened with, and found a series of packed records that are in the file, so I have a rough idea what is in the file, it will just be a case of getting it right. I did a similar thing with image libs, but they have a visual output which made it easiar and a simple bitmap structure for each image.
kyndigs
If you identify the application/format - maybe someone already have what you need?
Lars Fosdal
+1  A: 

A good hex editor is first. If the file is read without buffering (eg read directly from a TFileStream) you could gain some information when using ProcMon from SysInternals; You can see exactly what data is read in what chunks and thus determine more quickly where the boundaries are between the structures you already identified.

Ritsaert Hornstra
Hmm, is that entirely unbuffered? If there is some buffering, either in Delhpi or in a Windows layer below where procmon hooks, you might only see sector level chunks.
Marco van de Voort
@Marco: Windows buffers in the Kernel and what you see is the call the CreateFile, ReadFile, WriteFile IoCtrl etc and what is returned.
Ritsaert Hornstra