views:

347

answers:

6

I have a table with an identity field. What the best SQL query to get the Ids of all the deleted records from this table?

+2  A: 

You could use a recursive query:

DECLARE @MaxId int
SELECT @MaxId = SELECT IDENT_CURRENT('MyTable');

WITH Ids AS (
    SELECT 1 AS intId
    UNION ALL
    SELECT intId + 1
    FROM Ids
    WHERE intId < @MaxId
)
SELECT intId
FROM Ids AS i
WHERE NOT EXISTS (
    SELECT NULL FROM MyTable AS m
    WHERE m.intId = i.intId
)
OPTION (MAXRECURSION 0)

Though this won't be very efficient if the table is very large.

Paul
+1 because this works *IF* you change the "SET @MaxId = ...." line to "SELECT @MaxId = SELECT IDENT_CURRENT('MyTable');"
AdaTheDev
Good point, thanks. I've changed the answer accordingly
Paul
This doesn't work for a sequence of missing Ids though.
fung
Why doesn't it?
Paul
Not too sure myself. Seems like it should but it didn't when I ran it against my data. Your other query did however.
fung
+2  A: 

A completely different way to do it is this:

SELECT a.intId, b.intId
FROM MyTable a
    CROSS JOIN MyTable b
WHERE a.intId + 1 < b.intId
    AND NOT EXISTS (
        SELECT NULL FROM MyTable c
        WHERE c.intId > a.intId
            AND c.intId < b.intId
    )

Which will give pairs of IDs between which all the records have been removed.

So if the IDs were (1, 2, 3, 6, 7, 12), it would return (3, 6) and (7, 12).

EDIT:

This is very inefficient if the table is large. The following method is much better:

SELECT g.intStartId, MIN(t.intId) AS intEndId
FROM (
    SELECT intId AS intStartId
    FROM MyTable AS a
    WHERE NOT EXISTS (
        SELECT NULL FROM MyTable AS b
        WHERE b.intId = a.intId + 1
    )
) AS g
    CROSS JOIN MyTable AS t
WHERE t.intId > g.intStartId
GROUP BY g.intStartId

So we first find IDs that mark the start of a gap, and then we find the lowest ID we have greater than each to mark the end of the gap.

Paul
This actually works quite well but takes really long to complete due to the clause 'WHERE c.intId > a.intId AND c.intId < b.intId'. 93% in the execution plan for my case. Don't suppose there is any way to optimize that.
fung
See the edit above.
Paul
This works but you should be aware that it wont get just 'DELETED' records but failed inserts and rolled back transactions also can increment the identity count.
keithwarren7
Yes, this is true. I believe the only way of getting only deleted rows is to record the fact at the point of deletion.
Paul
I'm not looking for deleted records per se. More appropriately the missing records so this works.
fung
+1  A: 

One option is to create a temporary table/embeddable SQL statement containing all possible IDs (some options are outlined in this article) up to an including the max(identity) of your table.

Then you can left join from this canonical list of values with your table and filter on a null values on the right side.

Unsliced
This was the plausible way I was thinking of but seems resource intensive to me. And hoping to not have to muddle with temp tables.
fung
you have to be able to identify what you don't have, and given that the series of missing items are of indeterminate length, you can't accomodate them with arbitrary joins. @Paul offers a neatish solution which you could unpack - but you need either to loop or construct alternatives, which will be expensive in either case.
Unsliced
+1  A: 

A query focusing on sequential ID column is not enough. The ID sequence may skip numbers during insert if transaction fails, so if you have ID = (1,3) it does not mean that ID=2 was deleted, it may have been skipped. You have to use something to catch deleted records, like trigger, or OUTPUT DELETED.* -- or use something to compare against, like a snapshot, a backup, a history table.

Damir Sudarevic
A: 

A completely different approach, which does require a refactoring of your example, would be to not delete from the table, but to have a separate table of deleted item ids (or to have a field in the table showing the row's status). So you could then just select on those data instead. (This would wrap around @Damir's observation that you cannot differentiate between deletions and insert errors.)

Unsliced
+2  A: 

Left join with a numbers table and grab all the ones where it is null, this uses the built in numbers table but it is better to have your own

example of what the code would look like

create table #bla(id int)

insert #bla values(1)
insert #bla values(2)
insert #bla values(4)
insert #bla values(5)
insert #bla values(9)
insert #bla values(12)



select number from master..spt_values s
left join #bla b on s.number = b.id
where s.type='P'
and s.number < (select MAX(id) from #bla)
and  b.id is null

output

0 3 6 7 8 10 11

See here: How to return all the skipped identity values from a table in SQL Server for some more detail

SQLMenace
In the end this looks like the fastest method =)
fung