tags:

views:

24

answers:

1

Hi all,

I'm looking for best practices and ideas for managing data in a database table during a bulk data upload. I think it would be easiest to start with an example:

Suppose the database table in question contains the following records:

id | name      | ...
 1 | company A |
 2 | company B |
 3 | company C |

I then opt to bulk update this table with a new data file which contains info only for companies A and C with the understanding that company B is no longer a desired data point.

So my question is this: I'm using an upsert technique to manage data points listed in the data file, but how do I effectively remove the unrepresented data point from the database (e.g., company B)? Should I monitor timestamps and clear out results older than the upload start time? Or should I just dump the entire table and repopulate it (at the expense of managing indices)? The latter has the disadvantage of making the table useless during the upload, but the former has concurrency and locking issues. Thoughts?

+2  A: 

I put the data into a work table and then use sql code to insert or update only the records I want rather than bulk inserting to a production table.

Sorry misunderstood, you want o know how to get rid of no longre useful records? Again bulk insert to a work table, then join to that table to identify which records are no longer necessary and remove them with a delte statement.

HLGEM
Once the data is uploaded into a "working" area, you've got a lot of options open to you -- including getting a lot of work done before you ever modify the "main" table.
Philip Kelley
I never even considered staging the new data in a temp table. This will work well. Thanks!
trydionel
I never do an import of data without staging it first. Nothing goes into my prod tables until I have a change to clean it up.
HLGEM