tags:

views:

334

answers:

2

Hi there, I was wondering if anyone had any advice on parsing a file with fixed length records in Ruby. The file has several sections, each section has a header, n data elements and a footer. For example (This is total nonsense - but has roughly similar content)

1923  000-230SomeHeader     0303030 
209231-231992395    MoreData
293894-329899834    SomeData
298342-323423409    OtherData
3     3423942Footer record  9832422

Headers, Footers and Data rows each begin with a specific number (1,2 & 3) in this example.

I have looked at http://rubyforge.org/projects/file-formatter/ and it looks good - except that the documentation is light and I can't see how to have n data elements.

Cheers, Dan

A: 

Several options exist as usual.

If you want to do it manually I would suggest something like this:

very pseudo-code:

Read file
while lines in file
    handle_line(line) 
end

def handle_line
    type=first_char
    parse_line(type)
end

def parse_line
    split into elements and do_whatever_to_them
end

Splitting the line into elements of fixed with can be done with for instance unpack()

irb(main):001:0> line="1923  000-230SomeHeader     0303030"
=> "1923  000-230SomeHeader     0303030"
irb(main):002:0* list=line.unpack("A1A5A7a15A10") 
=> ["1", "923", "000-230", "SomeHeader     ", "0303030"]
irb(main):003:0>

The pattern used for unpack() will vary with field lengths on the different kinds of records and the code will depend on wether you want trailing spaces and such. See unpack reference for details.

Knut Haugen
+2  A: 

There are a number of ways to do this. The unpack method of string could be used to define a pattern of fields as follows :-

"209231-231992395    MoreData".unpack('aa5A1A9a4Z*')

This returns an array as follows :-

["2", "09231", "-", "231992395", "    ", "MoreData"]

See the documentation for a description of the pack/unpack format.

Steve Weet