I have to read invoice ascii files that are structured in a really convoluted way, for example:
55651108 3090617.10.0806:46:32101639Example Company Construction Company Example Road. 9 9524 Example City
There's actually additional stuff in there, but I don't want to confuse you any further.
I know I'm doomed if the client can't offer a better structure. For instance 30906 is an iterative number that grows. 101639 is the CustomerId. The whitespaces between "Example Company" and "Construction Company" are of variable length The field "Example Company" could have whitespaces of variable length too however, for instance "Microsoft Corporation Redmond". Same with the other fields. So there's no clear way to extract data from the latter part.
But that's not the question. I got taken away. My question is as follows:
If the input was somewhat structured and well defined, how would you guard against future changes in its structure. How would you design and implement a reader.
I was thinking of using a simple EAV Model in my DB, and use text or xml templates that describe the input, the entity names, and their valuetypes. I would parse the invoice files according to the templates.