views:

275

answers:

3

I have a file that was written with the following Delphi declaration ...


Type
  Tfulldata = Record
    dpoints, dloops : integer;
    dtime, bT, sT, hI, LI : real;
    tm : real;
    data : array[1..armax] Of Real;
  End;

...
Var:
  fh: File Of Tfulldata;

I want to analyse the data in the files (many MB in size) using Python if possible - is there an easy way to read in the data and cast the data into Python objects similar in form to the Delphi records? Does anyone know of a library perhaps that does this?

This is compiled on Delphi 7 with the following options which may (or may not) be pertinent,

  • Record Field Alignment: 8
  • Pentium Safe FDIV: False
  • Stack Frames: False
  • Optimization: True
+2  A: 

I do not know how Delphi internally stores data, but if it is as simple byte-wise data (so not serialized and mangled), use struct. This way you can treat a string from a python file as binary data. Also, open files as binary file(open,'rb').

KillianDS
+2  A: 

Please note that when you define a record in Delphi (like struct in C) the fields are layed out in order and in binary given the current alignment (eg Bytes are aligned on 1 byte boundaries, Words on 2 byte, Integers on 4 byte etc, but it may vary given the compiler settings.

When serialized to a file, you probably mean that this record is written in binary to the file and the next record is written after the first one starting at position sizeof( structure) etc etc. Delphi does not specify how thing should be serialized to/from file, So the information you give leaves us guessing.

If you want to make sure it is always the same without interference of any compiler setings, use packed record.

Real can have multiple meanings (it is an 48 bit float type for older Delphi versions and later on a 64 bit float (IEEE double)).

If you cannot access the Delphi code or compile it yourself, just ty to check the data with a HEX editor, you should see the boundaries of the records clearly since they start with Integers and only floats follow.

Ritsaert Hornstra
One note: Since there is an array type at the end of the structure, the array could be variable size
Ritsaert Hornstra
I do have access to the code that both reads and writes the data - it seems that in the code itself the only pertinent lines are those in the question. I have specified some options which seem important. I also looked in a HEX editor although this is unfamiliar territory for me - there are lots of random characters separated by large blocks of '00' characters ...
Brendan
Also the array size is declared with constant - armax = 5024
Brendan
Okay, I think I got the hang of the HEX editor, I had to decode each number individually. It seems that 'Real' translates to 64 bit IEEE754 doubles here
Brendan
@Brendan: so now you have enough information how to decode the structure and the whole file?
Ritsaert Hornstra
Edit: armax = 5025
Brendan
+3  A: 

Here is the full solutions thanks to hints from KillianDS and Ritsaert Hornstra

import struct
fh = open('my_file.dat', 'rb')
s = fh.read(40256)
vals = struct.unpack('iidddddd5025d', s)
dpoints, dloops, dtime, bT, sT, hI, LI, tm = vals[:8]
data = vals[8:]
Brendan
If armax=5024 then it seems you have an off-by-one error. Maybe there was a typo and you meant 5025?
Mark Ransom
Yeah that's right, armax=5025, I was working from memory and I had recently adjusted the value so that the data arrays were zero indexed and got confused - I find it annoying that Delphi has dynamic/open arrays zero indexed yet many functions (i.e. Copy()) start at one ...
Brendan
@Brendan: Dynamic arrays start at zero, only strings start at 1 by default (due to historic reasons). When you define an array yourself those offsets are used everywhere.
Ritsaert Hornstra
Well I suppose - however you may think that when you pass a static, one-indexed array to a procedure which modifies the array then inside the procedure the array should also be one-indexed, but it is cast as an open array and becomes zero indexed. `High()` and `Low()` functions help find the limits, but in my case the static array passed is not fully used and a separate `numValidPoints` parameter is passed - so zero-indexed/one indexed does matter. To avoid ambiguity I changed all the arrays to zero indexed.
Brendan