views:

147

answers:

2

I'm working a .NET web service that will be processing a text file with a relatively long, multilevel record format. Each record in the file represents a different entity; the record contains multiple sub-types. (The same record format is currently being processed by a COBOL job, if that gives you a better picture of what we're looking at). I've created a class structure (a DATA DIVISION if you will) to hold the input data.

My question is, what best practices have you found for processing large, complex fixed-width files in .NET? My general approach will be to read the entire line into a string and then parse the data from the string into the classes I've created. But I'm not sure whether I'll get better results working with the characters in the string as an array, or with the string itself. I guess that's the specific question, string vs. char[], but I would appreciate any other pointers anyone has.

Thanks.

+5  A: 

I would build classes that matched the data in the rows, using attributes for types, length etc. Then use the Microsoft.VisualBasic.FileIO.TextFieldParser object for reading the file, with some generic code for programming the parser based on the class, then reading the data and creating an instance of the class (all using reflection).

I use this for reading CSVs and its fast, flexible, extenisble, generic and easy to maintain. I also have attributes that allow me to add generic validation to each field as its being read.

I'd share my code, but its the IP of the firm I work for.

ck
+1 for being aware of TextFileParser. No-one seems to know about it on here. Unless there's some secret problem with it, such that no-one even likes to mention it?
MarkJ
Agreed. I never head of it before either. Thanks a lot ck.
John M Gant
Actually it is named `TextFieldParser`. I agree that is quite a useful class.
Gart
@Gart - thanks for the correction, I've fixed it now :)
ck