Hi,
I currently have a table of 3m records that needs updating nightly. The data that populates this table comes from ~100 APIs that all get normalised into one jumbo table.
Problem: How to reflect new records being added, and records being deleted at the source?
Facts: I can't truncate the table every night and reinsert. Each API provides a constant ID for each record (so I can keep track of what's what). Some fields will be updated each night.
Solutions: New records are easy, I just add them to my table with an AvailableFrom date. Updates are also easy, for each record I check if it exists and if data has changed (performance will suck).
Deleted records are where I'm stuck. The APIs just dump me a load of data, how do I tell if a record has "dropped off"?
I'm thinking a swap table of some sort - any ideas?