views:

94

answers:

3

I have a fixed-width flat file. To make matters worse, each line can either be a new record or a subrecord of the line above, identified by the first character on each line:

A0020SOME DESCRIPTION   MORE DESCRIPTION 922 2321      # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442      # records
B0021ANOTHER DESCRIPTION   THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"

I've tried using Flatworm which seems to be an excellent library for parsing fixed-width data. It's documentation, unfortunately, states:

"Repeating segments are supported only for delimited files"

(ibid, "Repeating segments").

I'd rather not write a custom parser for this. Is it (1) possible to do this in Flatworm or (2) is there a library providing such (multi-line, multi-sub-record) capabilities?

+1  A: 

Have you looked at JRecordBind?

http://jrecordbind.dev.java.net/

"JRecordBind supports hierarchical fixed length files: records of some type that are 'sons' of other record types."

Skip Head
A: 

You can import the data into a relational database.

emory
Sorry, but given the sample provided -- what do you think I'm trying to do?
matiasf
I don't know what u r trying to do, but u can use something like http://dev.mysql.com/doc/refman/5.1/en/load-data.html to bulk load this data into a mysql db. It is really good at handling stuff like records. Then when u need the data select it. The entire thing can be done in JDBC.
emory
Hi emory: The dataset is a fixed-width flat file with multiple subrecords (think COBOL output). I agree that the MySQL import routines are very good and efficient for CSV data, but none of the routines are able to handle fixed-width formats.
matiasf
A: 

Check Preon. Although Preon is targeting bitstream compressed data, you might be able to twist its arm and use it for the file format you identified as well. The benefit of using Preon would be that it will generate human-readable documentation as well.

Wilfred Springer