Hi all,
I'm looking for best practices and ideas for managing data in a database table during a bulk data upload. I think it would be easiest to start with an example:
Suppose the database table in question contains the following records:
id | name | ...
1 | company A |
2 | company B |
3 | company C |
I then opt to bulk update this table with a new data file which contains info only for companies A and C with the understanding that company B is no longer a desired data point.
So my question is this: I'm using an upsert
technique to manage data points listed in the data file, but how do I effectively remove the unrepresented data point from the database (e.g., company B)? Should I monitor timestamps and clear out results older than the upload start time? Or should I just dump the entire table and repopulate it (at the expense of managing indices)? The latter has the disadvantage of making the table useless during the upload, but the former has concurrency and locking issues. Thoughts?