views:

293

answers:

1

My company has a problem: we suspect that the NACHA files we are receiving from one of our application service providers that we use to draw money from our clients are incorrect.

We have all of the ACH agreements and legal mumbo-jumbo in place, so it's not a problem with our use of the ACH network, and we're not receiving word from the banks that things are going wrong, so we suspect that when the file is built from the sales information, it's missing some transactions that we are still getting charged for by our service provider.

My task: Take several months worth of NACHA files and decipher them to find out what was drawn from each customer and what was deposited to our accounts, and then compare them to sales data, bank statements, and other information via Access/Excel. Use MySQL for data.

At this point, awk (or similar Linux command line tool) is the tool that I have; I'm not proficient with 'actual' programming tools or practice, I'm more of a system-and-database administrator. I'm not afraid to get my hands dirty, I just don't have a lot of programming experience in reading this sort of thing with, say, C#.

My chief difficulty is in working with the actual NACHA file format: it's 94 characters wide, with fields determined by their position only, no delimiters. Using awk is (in my previous experience) dependent on the field separator variable, which is either white space or anything else...but I have been unsuccessful using it to tease out fields via position. I need to use something like awk because of the different record types in each file, there are 5 different line types in a file: 1, 5, 6, 8, and 9. Types 1 and 9 are the outer group, with header info, and 5 and 8 are batch header lines. Type 6 lines are details. My original plan was to read the header info into variables and then duplicate it on each line, basically de-normalizing it into a large table (or CSV, in the interim) with one record for each individual transaction, associated with all header info from the batch and the day, so:

 [transaction data1, data2],[batch data1, data2],[file info1, info2, etc] 
 [transaction data1, data2],[batch data1, data2],[file info1, info2, etc]
 [transaction data1, data2],[batch data1, data2],[file info1, info2, etc]

I favor building a tool that can do this on a continual basis going forward because it will become part of the data-monitoring we do on a daily/weekly basis.

So, how can I denormalize a NACHA file using awk or some similar tool? If there's a better tool for the job, I'm more than happy to hear about it. I haven't found anything in my searching online, unfortunately.

+1  A: 

If you look at the gawk info file (info gawk), there is a section called "3.6 Reading Fixed-Width Data". That may provide the information you need if you're using gawk.

From that file:

The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in variable `FIELDWIDTHS'.

Dennis Williamson
Excellent! Thank you very much for pointing that out; I had gotten so wrapped up in the (unfortunately dated) awk reference I had and looking at scripts online that I never checked the info page.
atroon
It works! Thank you very much.
atroon