ansaurus

Question

I have to read invoice data from a convoluted ASCII file, how would you guard against future changes?

Answer 1

A:

I believe that a template describing the entity names and the value types is good one. Something like a "schema" for a text file.

What I would try to do is to separate the reader from the rest of the application as much as possible. So, the question really is, how to define an interface that will be able to accommodate for changes in the parameters list. This is may not be always possible, but still, if you are relying on an interface to read the data, you could change the implementation of the reader without affecting the rest of the system.

kgiannakakis 2008-12-16 13:44:50

Answer 2

A:

Well, your file format looks much like the french protocol called Etebac used between banks and their customers.

It's a fixed width text format.

The best you can do is use some kind of unpack function :

$ perl -MData::Dumper -e 'print Dumper(unpack("A8 x A5 A8 A8 A6 A30 A30", "55651108 3090617.10.0806:46:32101639Example Company               Construction Company          Example Road. 9            9524 Example City"))'
$VAR1 = '55651108';
$VAR2 = '30906';
$VAR3 = '17.10.08';
$VAR4 = '06:46:32';
$VAR5 = '101639';
$VAR6 = 'Example Company';
$VAR7 = 'Construction Company';

What you should do is for every input, check that it is what it's supposed to be, that is, XX.XX.XX, or YY:YY:YY or that it does not start with a space, and abort if it does.

mat 2008-12-16 13:46:37

Answer 3

A:

I'd have a database of invoice data, with tables such as Company, Invoices, Invoice_Items. Depends on complexity, do you wish to record your orders as well, and then link invoices to the orders, and so on? But I digress...

I'd have an in-memory model of the database model, but that's a given. If XML output and input was needed, I would have an XML serialisation of the model if I needed to supply the invoices as data elsewhere, and a SAX parser to read it in. Some APIs might make this trivial to do, or maybe you just want to expose a web service to your repository if you are going to have clients reading from you.

As for reading in the text files (and there isn't much information relating to them - why would the format of these change? where are they coming from? Are you replacing this system, or will it keep on running, and you're just a new backend that that they're feeding?) You say the number of spaces is variable - is that just because the format is fixed-width columns? I would create a reader that would read them into your model, and hence your database schema.

JeeBee 2008-12-16 13:49:32

Answer 4

+1 A:

"If the input was somewhat structured and well defined, how would you guard against future changes in its structure. How would you design and implement a reader?"

You must define the layout in a way you can flexibly pick it apart.

Here's a python version

class Field( object ):
    def __init__( self, name, size ):
        self.name= name
        self.size = size
        self.offset= None

class Record( object ):
    def __init__( self, fieldList ):
        self.fields= fieldList
        self.fieldMap= {}
        offset= 0
        for f in self.fields: 
            f.offset= offset
            offset += f.size
            self.fieldMap[f.name]= f
    def parse( self, aLine ):
        self.buffer= aLine
    def get( self, aField ):
        fld= self.fieldMap[aField]
        return self.buffer[ fld.offset:fld.offset+fld.size+1 ]
    def __getattr__( self, aField ):
        return self.get(aField)

Now you can define records

myRecord= Record( 
    Field('aField',8), 
    Field('filler',1), 
    Field('another',5),
    Field('somethingElse',8),
)

This gives you a fighting chance of picking apart some input in a reasonably flexible way.

myRecord.parse(input)
myRecord.get('aField')

Once you can parse, adding conversions is a matter of subclassing Field to define the various types (dates, amounts, etc.)

S.Lott 2008-12-16 14:05:09

ansaurus

tags:

views:

answers:

I have to read invoice data from a convoluted ASCII file, how would you guard against future changes?

related questions