views:

487

answers:

3

I am trying to understand the file format of a Visual FoxPro compact index (*.IDX). I am currently referring to Microsoft's documentation for guidance.

The index is a B-tree of 512-byte nodes. Each leaf ("exterior") node contains multiple entries. Each entry consists of four pieces of data:

  • Row number [FIXED LENGTH]
  • Duplicate byte count (documentation doesn't explain this) [FIXED LENGTH]
  • Trailing byte count (documentation doesn't explain this) [FIXED LENGTH]
  • Key [VARIABLE LENGTH]

The entries (without their keys) are stored at the beginning of the node, immediately after the node's 24-byte header. Their keys are not included at this location because the keys vary in length, while the row number, duplicate byte count and trailing byte counts are fixed in length. The keys are stored at the end of the node and work their way backward. For example:

  • 24 byte header
  • row number, duplicate byte count, trailing byte count (entry #1)
  • row number, duplicate byte count, trailing byte count (entry #2)
  • row number, duplicate byte count, trailing byte count (entry #3)
  • ...
  • key (entry #3)
  • key (entry #2)
  • key (entry #1)

How do I determine the individual lengths of the keys? The documentation does not appear to specify this. They are perfectly contiguous (no null-byte separators).

I can isolate the keys manually by visual inspection. I suspected that the trailing byte count represented the length of the key. However, it did not correlate to the lengths determined by this inspection.

I believe that the FoxPro file formats are derived from the xBase standard. Perhaps this rings a bell?

A: 

Microsoft says this about the IDX file structure (and at the bottom of the page there are links to all others like the Compact Index format.)

boost
Thanks for the links. However, it appears that the Compact Index Format you linked to describes the exact same structures as the one I linked to (yours refers to FoxPro 8.0, mine refers to FoxPro 9.0 SP2, both are identical in this regard).
Adam Paynter
oh dear. Embarrassment++
boost
+3  A: 

After discovering the XBase::Index Perl module, I have determined that the keys in the exterior node are effectively the same length as the fixed-length keys found in the interior nodes, except any trailing spaces are removed. That is what the "trailing byte count" mentioned in the documentation refers to (how many trailing spaces were truncated off the end of the key). I still have not determined what the "duplicate byte count" is, but the module at least clarified its relationship:

variable_key_length = fixed_key_length - duplicate_byte_count - trailing_byte_count

For example, suppose the fixed key length for this index was 10 bytes. Now suppose that the key "DOG " was stored in an external node. Its duplicate byte count (according to what I have observed) will most likely be zero, while its trailing byte count will be 7 (the number of spaces truncated). Therefore, only the three bytes representing "DOG" would be stored.

Adam Paynter
+1  A: 

About duplicate byte count: this mean the number of first bytes, which are same in current key and in previous key. The first key entry stored at the end of the node has a full length, except trailing blanks; successive key entry has only symbols different from previous key entry.

Andrew
Excellent! Thanks for the tip! :) How did you find this out?
Adam Paynter