views:

46

answers:

2

I have a huge table with the following format:

DATETIME                   NUMBER
--------------------------------------
2009-03-31 16:05:52.000    2453651622
2009-03-31 16:16:12.000    30206080
2009-03-31 16:16:16.000    16890039
2009-03-31 16:16:28.000    2452039696
2009-03-31 16:16:33.000    140851934
2009-03-31 16:16:51.000    2453120306
2009-03-31 16:16:57.000    2453120306
...
2009-04-01 21:15:24.000    2453651622

How can I select the rows that don't have duplicate numbers in the second column if they occur less than 15 minutes apart?

In the previous example, the second row with number 2453120306 is a duplicate because it is less than 15 minutes apart from the previous one, and should not be selected.

The last row has the same number as the first row, but it is not a duplicate because it occurs more that 24 hours later.

A: 

try this:

Select * From Table t
Where Not Exists
   (Select * From Table
    Where DateTime Between DateAdd(second, -450,  t.DateTime)
                       And DateAdd(second,  450,  t.DateTime)
        -- next row is better to exclude same row using a PK if available
        And DateTime <> t.DateTime 
        And Number = t.Number) 

Also edited to use sliding 15 minute based on your answer above.

Charles Bretana
Tanks Charles. I liked your "Where Not Exists" idea! Unfortunately your query is returning zero rows.
citrus
@citrus, Sorry, you also have to eliminate the record itself from the comparison in the subquery. (i.e., for every row there is always a row within 15 minutes with the same number (Itself!) I have corrected the sql
Charles Bretana
A: 
-- distinct required in case there are rows with 
-- exactly the same values for datetime and number
SELECT DISTINCT a.*
FROM your_table AS a
    LEFT JOIN your_table AS b
        ON a.[number] = b.[number]
            AND a.[datetime] > b.[datetime]
            AND a.[datetime] <= DATEADD(minute, 15, b.[datetime])
WHERE b.Number IS NULL
LukeH
Tanks Luke, it seems to be almost doing the job. The problem is that is is keeping the last of the duplicates instead of the first one.
citrus
@citrus: I've edited to switch around the datetime comparisons so it should now return the first rather than the last.
LukeH