ansaurus

Question

Using t-sql to select a dataset with duplicate values removed

Answer 1

+3 A:

You don't need a cursor:

SELECT tmp.*
FROM
(
    SELECT *, ROW_NUMBER() OVER (PARTITION BY [Time] ORDER BY [Time]) AS RowNum
    FROM raw_data
) AS tmp
WHERE tmp.RowNum = 1

LukeH 2010-10-21 09:21:36

Yup, that's the kind of approach I'd take.

Matt Gibson 2010-10-21 09:25:09

Doesn't that just ignore the duplicate time values altogether, or am I misunderstanding PARTITION?

meepmeep 2010-10-21 09:29:57

@meepmeep: It'll return a single row for each distinct `Time` value: `PARTITION BY` creates a "window" for each distinct `Time` value; `ROW_NUMBER` gives each row within each partition an ascending number from 1 to *N*; the outer query just returns all rows where the row number is *1* (that is, the first row from each partition).

LukeH 2010-10-21 09:39:34

Aha! I was misunderstanding PARTITION, that's far more useful than I ever realised. Thank you!

meepmeep 2010-10-21 09:47:39

ansaurus

tags:

views:

answers:

Using t-sql to select a dataset with duplicate values removed

related questions