views:

108

answers:

4

I need to read data from a text file where the field lengths and record lengths are fixed. Fields are either zero padded or space padded, always appear in the same order and each record is terminated by a CRLF. The file can have one of three possible record types determined by the first character in the record.

So far I've create a base class for all record types and a child class for each record type.

type
  TRecordBase = class abstract
  public
    // Various common fields...
    function ToString: string; virtual; abstract;
    procedure Read(AString: string); virtual; abstract;
  end;

  TRecordType1 = class(TRecordBase)
  public
    //RecordType1 fields...
    function ToString: string; override;
    procedure Read(AString: string); override;
  end;

  TRecordType2 = class(TRecordBase)
  public
    //RecordType2 fields...
    function ToString: string; override;
    procedure Read(AString: string); override;
  end;

  TRecordType3 = class(TRecordBase)
  public
    //RecordType3 fields...
    function ToString: string; override;
    procedure Read(AString: string); override;
  end;

Then I simply read each line of the file as a string, determine its type from the first character, create the appropriate class instance and call Read.

The idea is that the Record classes can be used for both reading and writing to a string representation of the record. The Read procedure needs to break up a string and assign it to public fields.

I have two(or three) questions:

  • Is this a good approach to handle this type of file?
  • If so, what would your implementation of the Read procedure look like? (I've dealt with delimited files but this is my first encounter with fixed length fields)
  • If not, what approach would you take?

Update

Just thought I'd fill in some of the missing details. These record classes are essentially DTOs (data transfer objects). The fields are declared public and the only methods are for conversion to/from a string. The only data validation on the fields is the compiler's type checking. Fields are converted to string in the required order using TStringBuilder.AppendFormat. This ensures fields are padded and/or truncated to the proper length.

I went with Rob's suggestion to use Copy combined with the appropriate StrTo* for getting data from the string. I've also defined field positions and lengths as class constants, i.e.

const Field1Pos = 1;
const Field1Length = 1;
const Field2Pos = 2;
const Field2Length = 5;

The consts are a little easier to read than "magic numbers" in the calls to Copy.

Any other suggestions would be appreciated.

+1  A: 

Looks OK to me. For extracting the fields, you can use the Copy standard function. Give it the input string, the index of the first character of the field, and the number of characters, and it will return that portion as a new string, which you can then assign to another string variable or pass to another function for further conversion, such as StrToInt.

Rob Kennedy
A: 

I think your approach is a very elegant solution.

The one thing you don't specify is how your Fields will work. Since they are fixed length I would consider making them properties so in the Set Method of the property you could validate the length.

Robert Love
+2  A: 

I'd change one thing: Replace the read procedure with an Read constructor, something like this:

TRecordBase = class
public
  constructor CreateFromText(Text:string);virtual;abstract;
end;

TRecordType1 = class(TRecordBase)
public
  constructor CreateFromText(Text:string);override;
end;

Depending on what you do with your records this will save some typing and make code easier to read:

var s:string; // string from stream or string-list
if s[1] = 'X'then DoSomethingWith(TRecordType1.Create(s));

Having a virtual constructor is also handy if the number of record types grows. You can do something like this:

// Define an class type
type TRecordBaseClass = class of TRecordBase;

// Using Delphi 2010? Use a dictionary to register (FirstChar, TRecordBaseClass) paris
var RecordClassDictionary = TDictionary<char, TRecordBaseClass>;

// Init the dictionary like this:
RecordClassDictionary.Add('1', TRecordType1);
RecordClassDictionary.Add('2', TRecordType2);
RecordClassDictionary.Add('3', TRecordType3);

// And use it like this:
var RecordBaseClass: TRecordBaseClass;
for line in TextToParse do
  if RecordClassDictionary.TryGetValue(line[1], RecordBaseClass) then
     // Read the record, do something with the record
     DoSomethingWithTheRecord(RecordBaseClass.CreateFromText(line))
  else
     raise Exception.Create('Unkown record type.');
Cosmin Prund
Ah, a slight modification of the Abstract Factory pattern. I like it. I doubt the number of record types will increase. They've remained constant for about a decade now. The group that wrote the spec will likely abandon it in favor of xml if they ever do decide to make changes.
codeelegance
+1  A: 

If the field length and record length are fixed, I'd use the almost forgotten records with a variant part:

TRecord1 = packed record
  A: array[0..10] of char;
end;

TRecord2 = packed record
  B: array[0..20] of Byte;
  C: array[0..5] of Byte;
end;

TRecord3 = packed record
  D: array[0..10] of Byte;
  E: array[0..15] of Byte;
  F: array[0..1] of Byte;
end;


TMyRecord = packed record
  case RecordType: Char of
    '1': (Rec1: TRecord1);
    '2': (Rec2: TRecord2);
    '3': (Rec3: TRecord3);
end;

S := ReadLn;

with TMyRecord(S[1]) do
begin
  ...
end;

If you're using a Delphi release that supports record methods you can use them to access fields as well.

ldsandon