Hello,
I am struggling a bit with how I can unit test parsing a file... Let's say I have a file with 25 columns that could be anywhere from 20-1000 records long... How do I write a unit test against that? The function takes the file as a string as parameter and returns a DataTable with the file contents...
The best I can come up with is parsing a 4 record file and only checking the top left and bottom right 'corners'... e.g. the first few fields in the 2 top records and the last few fields of the 2 bottom records... I couldn't imagine having to tediously hand-type assert statements for every single field in the file. And doing just one record and every field seems just as weak, since it doesn't account for scenarios of multiple record files or unexpected data.
That seemed 'good enough' at the time... however now I'm working on a new project which is essentially the parsing of various PDF files coming in from 10 different sources, each source has 4-6 different formats for their files, so about 40-60 parsing routines. We may eventually fully automate 25 additional sources down the road. We take the PDF and convert it to excel using a 3rd party tool.. then we sit and analyze the patterns in the output, and write the code that calls the API of the tool, takes the excel file and parses it - stripping out the garbage, sorting around data thats in different places, cleaning it etc..
How realitically can I unit test something like this?