ansaurus

Question

C#, reading in Fixed Width records, varying record types in one file

Answer 1

A:

Best library for these sorts of things is filehelpers

lomaxx 2010-07-03 14:51:44

Going to download and fool around with this. My fear is that I will have to open the entire file 5 times, once for each 'specification' class that will be implemented by this assembly.

Mohgeroth 2010-07-03 15:06:41

Answer 2

+1 A:

One critique I have is that you are not correctly implementing ToString.

    public string ToString()

Should be:

    public override string ToString()

Mark Byers 2010-07-03 14:54:10

Used to java doing that for me, thanks for the critique!

Mohgeroth 2010-07-03 15:05:31

Answer 3

+1 A:

FileHelpers is nice. It has a couple of drawbacks in that it doesn't seem to be under active development anymore, and it makes you use public variables for your fields instead of letting you use properties. But otherwise good.

What are you doing with these files? Are you loading them into SQL Server? If so, and you're looking for FAST and SIMPLE, I'd recommend a design like this:

Make staging tables in your database that correspond to each of the 5 record types. Consider adding a LineNumber column and a FileName column too just so you can trace problems back to the file itself.
Read the file line by line and parse it out into your business objects, or directly into ADO.NET DataTable objects that correspond to your tables.
If you used business objects, apply your data transformations or business rules and then put the data into DataTable objects that correspond to your tables.
Once each DataTable reaches an appropriate BatchSize (say 1000 records), use the SqlBulkCopy object to pump the data into your staging tables. After each SqlBulkCopy operation, clear out the DataTable and continue processing.
If you didn't want to use business objects, do any final data manipulation in SQL Server.

You could probably accomplish the whole thing in under 500 lines of C#.

mattmc3 2010-07-03 15:17:35

I definitely don't want to put this in SQL server since the size of the raw extract files alone for one year is over 3 gigs! These files stand as our backup and we want certain things for both billing and client record keeping but the reality is that if someone wants to know something about client X at a point in time we can just unzip the files (Compression rate is 98%) and just use a process to read through and pull out what the client wants to know. Reading through this data fast helps so we can make a nice interface later to drill down into the data. Great information though, thanks!

Mohgeroth 2010-07-03 15:47:38

Answer 4

+1 A:

Biggest question besides some critique is, how should I bring in this file?

I do not know of any good library for file IO, but the reading is pretty straightforward.

Instantiate a StreamReader class using a 64kB buffer to limit disk IO operations (my estimations is 1500 transactions average per file per the end of the month).

Now you can stream over the file:
1) Using the Read at the beggining of each line to determine the type of the record.
2) Using the ReadLine method with the String.Split method to get column values.
3) Create the object using the column values.

or

You could just buffer the data from a Stream manually and IndexOf+SubString for more performance (if done right).

Also if the lines weren't columns but primitive datatypes in binary format, you could use the BinaryReader class for a very easy and performant way to read the objects.

Jaroslav Jandek 2010-07-03 18:42:53

Better performance and less headache using the MultiRecordEngine of file helpers for what I'm trying to do. Not the type of approach I would have hoped for but its efficient enough

Mohgeroth 2010-07-10 02:21:23

ansaurus

tags:

views:

answers:

C#, reading in Fixed Width records, varying record types in one file

related questions