tags:

views:

183

answers:

5

I am receiving a large list of current account numbers daily, and storing them in a database. My task is to find added and released accounts from each file. Right now, I have 4 SQL tables, (AccountsCurrent, AccountsNew, AccountsAdded, AccountsRemoved). When I receive a file, I am adding it entirely to AccountsNew. Then running the below queries to find which we added and removed.

INSERT AccountsAdded(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew WHERE AccountNumber not in (SELECT AccountNum FROM AccountsCurrent)

INSERT AccountsRemoved(AccountNum, Name) SELECT AccountNum, Name FROM AccountsCurrent WHERE AccountNumber not in (SELECT AccountNum FROM AccountsNew)

TRUNCATE TABLE AccountsCurrent

INSERT AccountsCurrent(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew

TRUNCATE TABLE AccountsNew

Right now, I am differencing about 250,000 accounts, but this is going to keep growing. Is this the best method, do you have any other ideas?

EDIT: This is an MSSQL 2000 database. I'm using c# to process the file.

The only data I am focused on is the accounts that were added and removed between the last and current files. The AccountsCurrent, is only used to determine what accounts were added or removed.

+1  A: 

Sounds like a history/audit process that might be better done using triggers. Have a separate history table that captures changes (e.g., timestamp, operation, who performed the change, etc.)

New and deleted accounts are easy to understand. "Current" accounts implies that there's an intermediate state between being new and deleted. I don't see any difference between "new" and "added".

I wouldn't have four tables. I'd have a STATUS table that would have the different possible states, and ACCOUNTS or the HISTORY table would have a foreign key to it.

duffymo
The problem there is that I don't want to end up with accounts in the current table with status 'REMOVED'. I also don't need a history, I only need the list of Adds and Removes in the current file. The current list is really used with the next file to determine if an account was added or removed.
scottm
+1  A: 

To be honest, I think that I'd follow something like your approach. One thing is that you could remove the truncate, do a rename of the "new" to "current" and re-create "new".

Tim Almond
THAT'S an interesting thought. I like it...
duffymo
+1  A: 

Using IN clauses on long lists can be slow.

If the tables are indexed, using a LEFT JOIN can prove to be faster...

INSERT INTO [table] (
    [fields]
    )
SELECT
    [fields]
FROM
    [table1]
LEFT JOIN
    [table2]
        ON [join condition]
WHERE
    [table2].[id] IS NULL

This assumes 1:1 relationships and not 1:many. If you have 1:many you can do any of...
1. SELECT DISTINCT
2. Use a GROUP BY clause
3. Use a different query, see below...

INSERT INTO [table] (
    [fields]
    )
SELECT
    [fields]
FROM
    [table1]
WHERE
    EXISTS (SELECT * FROM [table2] WHERE [condition to match tables 1 and 2])

-- # This is quick provided that all fields to match the two tables are
-- # indexed in both tables.  Should then be much faster than the IN clause.
Dems
Good reminder, I'll have to do some testing to see which is faster
scottm
A: 

You could also subtract the intersection to get the differences in one table.

Clay
A: 

If the initial file is ordered in a sensible and consistent way (big IF!), it would run considerably faster as a C# program which logically compared the files.

James Anderson
That would be nice, but it's not ordered.
scottm
Oh well -- the time to sort the file plus the comparison would probably be about the same as your sql based solution!
James Anderson