tags:

views:

46

answers:

2

I have a CSV file. The first row will always contain column headers. Depending on a variety of factors, the order of columns may change and, in rare circumstances, some columns may not be present. These changes are beyond my control.

My thoughts, so far, on how to address this. I'll read the first row of the file and use the values to generate a list of columns contained in the source file. The destination file will use the same column names as the source. This should be as simple as searching for identical names in the source and destination, then just mapping the column index values, right?

What are your recommendations for handling this?

A: 

I did this once by building a Hash Map of the names of the columns I expected to be there to the indexes (or actual column names) of the the column headers that were actually present. I did this by first building the map with all the column names I expected as keys and some value like -1 as the values. I then got the array of column headers. With a nested loop going through both the loop for all keys in the map and all headers present in the file, I did a case insensitive comparison after trimming out the whitespace, if there was a match I put the index of the column as the value for that key in the map. Then when building the destination file all I had to do was loop through the keys in the map for every line in the CSV and get the data from the index specified in the map and do whatever I want with the data, ignoring the column if the value in the map was -1. I did this in Java, but I imagine it is nearly exactly the same in C#.

Adam
A: 

If I was going to do this I would use SQL and DAO. There are 2 tricks. the first one is to establish a connection to the Excel worksheet as if it was a database table (you will need the DAO 3.6 object library as a reference in the VBA),

Dim dbtmp As dao.Database Dim qd As dao.QueryDef

Set dbtmp = OpenDatabase(mPath & "\" & mName, False, True, "Excel 8.0;") Set qd = dbtmp.CreateQueryDef("", " THE QUERY ") qd.Execute

and the second one is to establish a connection to the CSV file as a datasource. The format of that is something like:

mQuery = "SELECT * FROM [Text;HDR=NO;CharacterSet=437;DATABASE=" mQuery = mQuery & mpath mQuery = mQuery & "]." mQuery = mQuery & mfile mQuery = mQuery & ";"

You will have to build the SQL using code, you need to establish the fields in the sourcefile, and you are building an INSERT INTO query. And it is probably simplest to create the select query to the source data with a name, and then use it in the insert into query

The benefit of this is that the SQL will take care of the missing fields in the source data, and it will not be fussy about the order the columns appear in. You can get the headings by opening the csv as a text file.

alastair harris