tags:

views:

108

answers:

3

how to write a statement to accomplish the folowing?

lets say a table has 2 columns (both are nvarchar) with the following data

col1 10000_10000_10001_10002_10002_10002
col2 10____20____10____30____40_____50

I'd like to keep only the following data:

col1 10000_10001_10002
col2 10____10____30

thus removing the duplicates based on the second column values (neither of the columns are primary keys), keeping only those records with the minimal value in the second column.

how to accomplish this?

A: 

Sorry, I misunderstood the question.


SELECT col1, MIN(col2) as col2
FROM table
GROUP BY col1

Of course returns the rows in question, but assuming you can't alter the table to add a unique identifier, you would need to do something like:


DELETE FROM test
WHERE col1 + '|' + col2 NOT IN
(SELECT col1 + '|' + MIN(col2)
FROM test
GROUP BY col1)

Which should work assuming that the pipe character never appears in your set.

Jason Francis
Doesn't really answer the question though. OP asked about deleting rows, not selecting them
Simon Nickerson
Right. My brain isn't in gear yet. I think the correction should work.
Jason Francis
A: 

Ideally, you'd like to be able to say:

DELETE
FROM tbl
WHERE (col1, col2) NOT IN (SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1)

Unfortunately, that's not allowed in T-SQL, but there is a proprietary extension with a double FROM (using EXCEPT for clarity):

DELETE
FROM tbl
FROM tbl
EXCEPT
    SELECT col1, MIN(col2) AS col2 FROM tbl GROUP BY col1

In general:

DELETE
FROM tbl
WHERE col1 + '|' + col2 NOT IN (SELECT col1 + '|' + MIN(col2) FROM tbl GROUP BY col1)

Or other workarounds.

Cade Roux
+2  A: 

This should work for you:

;
WITH NotMin AS
(
    SELECT Col1, Col2, MIN(Col2) OVER(Partition BY Col1) AS TheMin
    FROM Table1
)

DELETE Table1
--SELECT * 
FROM Table1
INNER JOIN NotMin
ON Table1.Col1 = NotMin.Col1 AND Table1.Col2 = NotMin.Col2 
    AND Table1.Col2 != TheMin

This uses a CTE (like a derived table, but cleaner) and the over clause as a shortcut for less code. I also added a commented select so you can see the matching rows (verify before deleting). This will work in SQL 2005/2008.

Thanks, Eric

Strommy
If using large result-sets, this may not be optimal performance-wise. If that's the case, we can work on a better answer.
Strommy
I like to use row_number() or rank() for this kind of thing personally... but it's still good and should be accepted.
Rob Farley
Good point. I'd be interested in seeing your solution in that regard. I always like to see novel uses for the over clause. :-)
Strommy