views:

640

answers:

6

I am trying to selectively delete records from a SQL Server 2005 table without looping through a cursor. The table can contain many records (sometimes > 500,000) so looping is too slow.

Data:

ID, UnitID, Day, Interval, Amount

1   100     10   21        9.345

2   100     10   22        9.367

3   200     11   21        4.150

4   300     11   21        4.350

5   300     11   22        4.734

6   300     11   23        5.106

7   400     13   21       10.257

8   400     13   22       10.428

Key is: ID, UnitID, Day, Interval

In this example I wish to delete Records 2, 5 and 8 - they are adjacent to an existing record (based on the key).

Note: record 6 would not be deleted because once 5 is gone it is not adjecent any longer.

Am I asking too much?

+1  A: 

I don't think what you're asking for is possible — but you may be able to get close. It appears you can almost do it by finding records with a self-join like this:

SELECT t1.id
FROM
  table t1 JOIN table t2 ON (
    t1.unitid = t2.unitid AND
    t1.day = t2.day AND
    t1.interval = t2.interval - 1
  )

but the problem is, that'll find id=6 as well. However, if you create a temporary table from this data, it may be much smaller than your original data, and thus far faster to scan with a cursor (to fix the id=6 problem). You can then do a DELETE FROM table WHERE id IN (SELECT id FROM tmp_table) to kill the rows.

There may be a way to fix the ID=6 problem w/o a cursor, but if so, I don't see it.

derobert
If you've already put the original select into a table variable, then you could use a WHILE instead of a cursor to fix the id=6 problem.
Scott Whitlock
A: 

There is the WHILE statement, which is an alternative to the cursor. That combined with table variables might let you do the same thing within a performance bound you're OK with.

Scott Whitlock
A: 
DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.367)
INSERT INTO @Table VALUES (3, 200, 11, 21, 4.150)
INSERT INTO @Table VALUES (4, 300, 11, 21, 4.350)
INSERT INTO @Table VALUES (5, 300, 11, 22, 4.734)
INSERT INTO @Table VALUES (6, 300, 11, 23, 5.106)
INSERT INTO @Table VALUES (7, 400, 13, 21, 10.257)
INSERT INTO @Table VALUES (8, 400, 13, 22, 10.428)

DELETE FROM @Table
WHERE ID IN (
  SELECT t1.ID
  FROM @Table t1
       INNER JOIN @Table t2 
            ON  t2.UnitID = t1.UnitID 
                AND t2.Day = t1.Day 
                AND t2.Interval = t1.Interval - 1
       LEFT OUTER JOIN @Table t3 
            ON  t3.UnitID = t2.UnitID 
                AND t3.Day = t2.Day 
                AND t3.Interval = t2.Interval - 1
  WHERE t3.ID IS NULL)

SELECT * FROM @Table
Lieven
This will delete only the first adjacent value. If we have `21`, `22`, `23` and `24`, this will delete `22` but leave `24`.
Quassnoi
You are right. Make that while exists (select...) delete from ...
Lieven
+4  A: 

See these articles in my blog for performance detail:


The main idea for the query below is that we should delete all even rows from continuous ranges of intervals.

That is, if for given (unitId, Day) we have the following intervals:

1
2
3
4
6
7
8
9

, we have two continuous ranges:

1
2
3
4

and

6
7
8
9

, and we should delete every even row:

1
2 -- delete
3
4 -- delete

and

6
7 -- delete
8
9 -- delete

, so that we get:

1
3
6
8

Note that "even rows" means "even per-range ROW_NUMBER()s" here, not "even values of interval".

Here's the query:

DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.345)
INSERT INTO @Table VALUES (3, 200, 11, 21, 9.345)
INSERT INTO @Table VALUES (4, 300, 11, 21, 9.345)
INSERT INTO @Table VALUES (5, 300, 11, 22, 9.345)
INSERT INTO @Table VALUES (6, 300, 11, 23, 9.345)
INSERT INTO @Table VALUES (7, 400, 13, 21, 9.345)
INSERT INTO @Table VALUES (8, 400, 13, 22, 9.345)
INSERT INTO @Table VALUES (9, 400, 13, 23, 9.345)
INSERT INTO @Table VALUES (10, 400, 13, 24, 9.345)
INSERT INTO @Table VALUES (11, 400, 13, 26, 9.345)
INSERT INTO @Table VALUES (12, 400, 13, 27, 9.345)
INSERT INTO @Table VALUES (13, 400, 13, 28, 9.345)
INSERT INTO @Table VALUES (14, 400, 13, 29, 9.345)

;WITH   rows AS
        (
        SELECT  *,
                ROW_NUMBER() OVER
                (
                PARTITION BY
                        (
                        SELECT  TOP 1 qi.id AS mint
                        FROM    @Table qi
                        WHERE   qi.unitid = qo.unitid
                                AND qi.[day] = qo.[day]
                                AND qi.interval <= qo.interval
                                AND NOT EXISTS
                                (
                                SELECT  NULL
                                FROM    @Table t
                                WHERE   t.unitid = qi.unitid
                                        AND t.[day] = qi.day
                                        AND t.interval = qi.interval - 1
                                )
                        ORDER BY
                                qi.interval DESC
                        )
                ORDER BY interval
                ) AS rnm
        FROM    @Table qo
        )
DELETE
FROM    rows
WHERE   rnm % 2 = 0

SELECT  *
FROM    @table

Update:

Here's a more efficient query:

DECLARE @Table TABLE (ID INT, UnitID INT, [Day] INT, Interval INT, Amount FLOAT)

INSERT INTO @Table VALUES (1, 100, 10, 21, 9.345)
INSERT INTO @Table VALUES (2, 100, 10, 22, 9.345)
INSERT INTO @Table VALUES (3, 200, 11, 21, 9.345)
INSERT INTO @Table VALUES (4, 300, 11, 21, 9.345)
INSERT INTO @Table VALUES (5, 300, 11, 22, 9.345)
INSERT INTO @Table VALUES (6, 300, 11, 23, 9.345)
INSERT INTO @Table VALUES (7, 400, 13, 21, 9.345)
INSERT INTO @Table VALUES (8, 400, 13, 22, 9.345)
INSERT INTO @Table VALUES (9, 400, 13, 23, 9.345)
INSERT INTO @Table VALUES (10, 400, 13, 24, 9.345)
INSERT INTO @Table VALUES (11, 400, 13, 26, 9.345)
INSERT INTO @Table VALUES (12, 400, 13, 27, 9.345)
INSERT INTO @Table VALUES (13, 400, 13, 28, 9.345)
INSERT INTO @Table VALUES (14, 400, 13, 29, 9.345)

;WITH    source AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY unitid, day ORDER BY interval) rn
        FROM    @Table
        ),
        rows AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY unitid, day, interval - rn ORDER BY interval) AS rnm
        FROM    source
        )
DELETE
FROM    rows
WHERE   rnm % 2 = 0

SELECT  *
FROM    @table
Quassnoi
If your contiguous range of intervals is 2, 3, 4, 5 -- you need to delete odds and not evens.
Matt
@Matt: Need to delete even rows, not even values. Interval `3` will have an even `ROW_NUMBER()` of `2` in your example.
Quassnoi
I get it now. Looks like WITH/ROW_NUMBER is the way to go for this one
Matt
+1. Nice one... again
Lieven
@Lieven: making a blog post out of it :)
Quassnoi
I am still waiting for your last one :)
Lieven
@Lieven: the Saturday's one will be tomorrow, I was busy way too busy on the weekend :) For the Sundays, I'm preparing a huge series (with Flash). I really suck at Flash, that's why it's almost three weeks overdue :)
Quassnoi
A: 

Lieven is so close - it worked for the test set, but if I add a few more records it starts to miss some.

We cannot use any odd/even criteria - we have no idea how the data falls.

Add this data and retry:

INSERT @Table VALUES (9, 100, 10, 23, 9.345)

INSERT @Table VALUES (10, 100, 10, 24, 9.367)

INSERT @Table VALUES (11, 100, 10, 25, 4.150)

INSERT @Table VALUES (12, 100, 10, 26, 4.350)

INSERT @Table VALUES (13, 300, 11, 25, 4.734)

INSERT @Table VALUES (14, 300, 11, 26, 5.106)

INSERT @Table VALUES (15, 300, 11, 27, 10.257)

INSERT @Table VALUES (16, 300, 11, 29, 10.428)

@Ian: retried, it leaves rows `9`, `11`, `13`, `15` and `16`, deletes `10`, `12` and `14`. Isn't it what you want?
Quassnoi
A: 

Quassnoi seems to have it cracked - I ran it with my extended test set and it worked perfectly.

I am going to try a much bigger set next.

Thanks to all for helping!!

:)

Ian

ianhoc
Could you please then mark my answer as accepted? Thanks :)
Quassnoi
There is no check mark by your answer - how do I accept your answer?Thanks againIan
ianhoc
@ianhoc: you seem to login under a different username. If you didn't use an OpenId account to acces the site, then I don't think it will allow you to restore the username. Nevermind then.
Quassnoi
@ianhoc: It looks like you have two seperate Stack Overflow accounts, since some posts here list you as "Ian Hockaday" (the original post) and others as ianhoc. You'll need to log in as Ian Hockaday to mark an answer as correct. (You need to be the original poster to accept an answer.)
Shannon Severance
Sorry Quassnoi, I just registered today, but I think I posted my question *before* I had logged in properly.I wish I could accept your answer, but I cannot.Thanks for the excellent code!!Ian
ianhoc