views:

130

answers:

4

I am currently working on a project to parse an excel sheet and insert any values into a database which were not inserted previously. The sheet contains roughly 80 date-value pairs for different names, with an average of about 1500 rows per pair.

Each name has 5 date-value pairs entered manually at the end of the week. Over the weekend, my process will parse the excel file and insert any values that are not currently in the database.

My question is, given the large amount of total data and the small amount added each week, how would you determine easily which values need to be inserted? I have considered adding another table to store the last date inserted for each name and taking any rows after that.

+1  A: 

Simplest solution, I would bring it all into a staging table and do the compare in the server. Alternatively, SSIS with an appropriate sort and lookup could determine the differences and insert them.

120000 rows is not significant to compare in the database using SQL, but 120000 individual calls to the database to verify if the row is in the database might take a while on a client-side.

Cade Roux
+1  A: 

Option 1 would be to create a "lastdate" table that is automatically stamped at the end of your weekend import. Then the next week your program could query the last record in that table, then only read from the excel file after that date. probably your best bet.

Option 2 would be to find a unique field in the data, and row by row check if that key exists in the database. If it doesn't exist, you add it, if it does you won't. This would be my 2nd choice if Option 1 didn't work how you expect it.

Jeremy Morgan
A: 

It all depends how bullet proof your solution needs to be. If you trust the users that the spreadsheet will not be tweaked in any way that would make it inconsistent, than your solution would be fine.

If you want to be on the safe side (e.g. if some old values could potentially change), you would need to compare the whole thing with the database. To be honest the amount of data you are talking here doesn't seem very big, especially when you process will run on a Weekend. And you can still optimize by writing "batch" type of stored procs for the database.

Grzenio
A: 

Thanks for the answers all.

I have decided, rather than creating a new table that stores the last date, I will just select the max date for each name, then insert values after that date into the table.

This assumes that the data prior to the last date remains consistent, which should be fine for this problem.

Norla