tags:

views:

27

answers:

3

I would like to filter duplicate rows on conditions so that the rows with minimum modified and maximum active and unique rid and did are picked. self join? or any better approach that would be performance wise better?

Example:

id        rid                  modified                 active         did
1             1             2010-09-07 11:37:44.850              1             1
2             1             2010-09-07 11:38:44.000              1             1
3             1             2010-09-07 11:39:44.000              1             1
4             1             2010-09-07 11:40:44.000              0             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

Output expected is

1             1             2010-09-07 11:37:44.850              1             1
5             2             2010-09-07 11:41:44.000              1             1
6             1             2010-09-07 11:42:44.000              1             2

Commenting on the first answer, the suggestion does not work for the below dataset(when active=0 and modified is the minimum for that row)

 id        rid                     modified                      active           did
    1             1             2010-09-07 11:37:44.850              1             1
    2             1             2010-09-07 11:38:44.000              1             1
    3             1             2010-09-07 11:39:44.000              1             1
    4             1             2010-09-07 11:36:44.000              0             1
    5             2             2010-09-07 11:41:44.000              1             1
    6             1             2010-09-07 11:42:44.000              1             2
+2  A: 

Assuming SQL Server 2005+. Use RANK() instead of ROW_NUMBER() if you want ties returned.

;WITH YourTable as
(
SELECT 1 id,1 rid,cast('2010-09-07 11:37:44.850' as datetime) modified, 1 active,1 did union all
SELECT 2,1,'2010-09-07 11:38:44.000', 1,1 union all
SELECT 3,1,'2010-09-07 11:39:44.000', 1,1 union all
SELECT 4,1,'2010-09-07 11:36:44.000', 0,1 union all
SELECT 5,2,'2010-09-07 11:41:44.000', 1,1 union all
SELECT 6,1,'2010-09-07 11:42:44.000', 1,2
),cte as
(
SELECT id,rid,modified,active, did,
ROW_NUMBER() OVER (PARTITION BY rid,did ORDER BY active DESC, modified ASC ) RN
FROM YourTable
)
SELECT id,rid,modified,active, did
FROM cte
WHERE rn=1
order by id
Martin Smith
Thanks but I don't think it works for the conditions stated - row with max active and min modified...See the below example
To be honest it isn't really clear what you mean. This would return your expected output but there are about 3 different ways of interpreting what you have said!
Martin Smith
1 1 2010-09-07 11:37:44.850 1 12 1 2010-09-07 11:38:44.000 1 13 1 2010-09-07 11:39:44.000 1 14 1 2010-09-07 11:36:44.000 0 15 2 2010-09-07 11:41:44.000 1 16 1 2010-09-07 11:42:44.000 1 2 expected output the same as the previous example 1 1 2010-09-07 11:37:44.850 1 1 5 2 2010-09-07 11:41:44.000 1 1 6 1 2010-09-07 11:42:44.000 1 2
Oops the edit doesn't work well in the comment editor. I have edited the original question with the second example dataset.
Well that just looks like you want to use `ORDER BY active DESC, modified ASC` instead then?
Martin Smith
Thanks. Looks good now.
A: 

select id, rid, min(modified), max(active), did from foo group by rid, did order by id;

Jeremy
No. First, you have an error with including id in your `select` but not in your `group by`. BUT, If you were to include it in the `group by`, then you'd end up returning every row since id is unique.
Joe Stefanelli
A: 

You can get good performance with a CROSS APPLY if you have a table that has one row for each combination of rid and did:

SELECT
   X.*
FROM
   ParentTable P
   CROSS APPLY (
      SELECT TOP 1 *
      FROM YourTable T
      WHERE P.rid = T.rid AND P.did = T.did
      ORDER BY active DESC, modified
   ) X

Substituting (SELECT DISTINCT rid, did FROM YourTable) for ParentTable would work but will hurt performance.

Also, here is my crazy, single scan magic query which can often outperform other methods:

SELECT
   id = Substring(Packed, 6, 4),
   rid,
   modified = Convert(datetime, Substring(Packed, 2, 4)),
   Active = Convert(bit, 1 - Substring(Packed, 1, 1)),
   did,
FROM
   (
      SELECT
         rid,
         did,
         Packed = Min(Convert(binary(1), 1 - active) + Convert(binary(4), modified) + Convert(binary(4), id)
      FROM
         YourTable
      GROUP BY
         rid,
         did
   ) X

This method is not recommended because it's not easy to understand, and it's very easy to make mistakes with it. But it's a fun oddity because it can outperform other methods in some cases.

Emtucifor