I am currently working on a project for traversing an excel document and inserting data into a database using C#.
The relevant data for this project is:
- The excel sheet has 14 rows at the top that I do not care about. (sometimes 15, see Russia/Siberia below)
- The data is grouped by name into 2 columns (date and value), such as:
Sheet 1
USA China Russia
Date Value Date Value Siberia
1/1/09 4.3654 1/1/09 2.7456 Date Value
1/2/09 3.5545 1/3/09 9.3214 2/5/09 0.2454
1/3/09 3.2322 1/21/09 5.2234 2/6/09 0.5557
- The name I need to acquire is whichever is listed directly above "Date".
- I only care about data from dates we do not have in the database. Before each column set is parsed, I will acquire the max date for any given name from the database, and skip anything at or before it.
- There is no guarantee that the columns will be in a constant order or have constant spacing.
- I do not want data for all names, rather only those in a list I put together before the file is acquired.
My current plan is this:
- For each column, if the date field is at row 16, save the name as the value in row 15 above it, check the database for the last date for that name, only insert data where the date is greater than the acquired date.
- If the date field is at row 17, do the same thing, but start the for loop through each row at 18.
- If the name is not in the list, skip the column. If it is, make sure to grab the column next to it for the necessary values.
My problem is:
- I am currently trying to use the ExcelDataReader from Codeplex(http://www.codeplex.com/ExcelDataReader). This only likes csv-like sheets, which this project has not.
- I do not know of any alternative Excel readers.
- To the best of my knowledge, a straight FileStream traversal of this file can only go row-by-row, rather than column-by-column.
To anyone still reading, thank you for your time. Any recommendations on how to proceed? Please ensure that solutions can traverse each column, not each row.
Also, please don't worry about the database stuff, or the list of names that precedes the traversal.
Addendum: What I'd really like to end up with is some type of table that I can just traverse with a nested loop, making column-centric traversal much, much easier. Because there is so much garbage near the top of the sheet (14+ rows), most simple solutions are not feasible.