To start I would like to clarify that I'm not extremely well versed in C#. In that, a project I'm doing working in C# using .Net 3.5 has me building a class to read from and export files that contain multiple fixed width formats based on the record type.
There are currently 5 types of records indicated by the first character position in each line of the file that indicate a specific line format. The problem I have is that the types are distinct from each other.
Record type 1 has 5 columns, signifies beginning of the file
Record type 3 has 10 columns, signifies beginning of a batch
Record type 5 has 69 columns, signifies a transaction
Record type 7 has 12 columns, signifies end of the batch, summarizes
(these 3 repeat throughout the file to contain each batch)
Record type 9 has 8 columns, signifies end of the file, summarizes
Is there a good library out there for these kinds of fixed width files? I've seen a few good ones that want to load the entire file in as one spec but that won't do.
Roughly 250 of these files are read at the end of every month and combined filesize on average is about 300 megs. Efficiency is very important to me in this project.
Based on my knowledge of the data I've build a class hierarchy of what I "think" an object should look like...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Extract_Processing
{
class Extract
{
private string mFilePath;
private string mFileName;
private FileHeader mFileHeader;
private FileTrailer mFileTrailer;
private List<Batch> mBatches; // A file can have many batches
public Extract(string filePath)
{ /* Using file path some static method from another class would be called to parse in the file somehow */ }
public string ToString()
{ /* Iterates all objects down the heiarchy to return the file in string format */ }
public void ToFile()
{ /* Calls some method in the file parse static class to export the file back to storage somewhere */ }
}
class FileHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Batch
{
private string mBatchNumber; // Should this be pulled out of the batch header to make LINQ querying simpler for this data set?
private BatchHeader mBatchHeader;
private BatchTrailer mBatchTrailer;
private List<Transaction> mTransactions; // A batch can have multiple transactions
public string ToString()
{ /* Iterates through batches to return what the entire batch would look like in string format */ }
}
class BatchHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Transaction
{ /* ... contains data types for all fields in this format, ToString etc */ }
class BatchTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
class FileTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
}
Ive left out many constructors and other methods but I think the idea should be pretty solid. I'm looking for ideas and critique to the methods I'm considering as again, not knowledgable about C# and the execution time is the highest priority.
Biggest question besides some critique is, how should I bring in this file? I've brought in many files in other languages such as VBA using FSO methods, Microsoft Access ImportSpec to read in the file (5 times, one for each spec... wow that was inefficient!), created a 'Cursor' object in visual foxpro (which was FAAAAAAAST but again, had to do five times) but am looking for hidden gems in C# if said things exist.
Thanks for reading my novel, let me know if your having issues understanding it. I'm taking the weekend to go over this design to see if I buy it and want to take the effort to implement it this way.