I have a process that consolidates 40+ identically structured databases down to one consolidated database, the only difference being that the consolidated database adds a project_id field to each table.
In order to be as efficient as possible, I'm try to only copy/update a record from the source databases to the consolidated database if it's been added/changed. I delete outdated records from the consolidated database, and then copy in any non-existing records. To delete outdated/changed records I'm using a query similar to this:
DELETE FROM <table>
WHERE NOT EXISTS (SELECT <primary keys>
FROM <source> b
WHERE ((<b.fields = a.fields>) or
(b.fields is null and a.fields is null)))
AND PROJECT_ID = <project_id>
This works for the most part, but one of the tables in the source database has over 700,000 records, and this query takes over an hour to complete.
How can make this query more efficient?