views:

77

answers:

1

I thought I had this figured out but it turns out I'm just deleting the first record. The following returns the duplicate rows. All have a count of 2. I just want to delete the first one for each duplicate record.

select scorestudentid, scoreadvisor, scorecorrect, count(*) 
from scores
where scoretestid = 3284
group by scorestudentid, scoreadvisor, scorecorrect
having count(scorestudentid) > 1

Which returns:

scorestudentid  scoreadvisor  scorecorrect  no column name
13033719        28059     3.0           2
13033777        28086     3.0           2
13033826        28147     3.0           2
13033960        28023     3.0           2

So I put this together thinking it would work:

set rowcount 1
delete
from scores
where scoretestid = 3284 
and scorestudentid in (
    select scorestudentid
    from scores
    where scoretestid = 3284
    group by scorestudentid
    having count(scorestudentid) > 1)

It really seems like it should be a simple concept but I'm not getting it.

Based on Thomas script I updated the query to fit but it still doesn't work.

Delete Scores
Where Exists    (
                Select 1
                From Scores As S2
                Where S2.ScoreStudentId = Scores.ScoreStudentId
                        And S2.ScoreAdvisor = Scores.ScoreAdvisor
                        And S2.ScoreCorrect = Scores.ScoreCorrect
                Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
                Having Count(*) > 1
                    And Min(S2.NewScoreID) = Scores.NewScoreID
                )
    And Scores.ScoreTestId = 3284
+2  A: 

The trick is using the Primary Key column (you do have one, correct?) and simply finding the first PK value that matches the criteria you want. If for some crazy reason you do not have a primary key column, then add an Identity column and make it the primary key and then do the delete.

EDIT Revised to make it more generic. If you remove the final filter on ScoreTest, it will remove all duplicates based on ScoreStudentId, ScoreAdvisor and ScoreCorrect.

Delete Scores
Where Exists    (
                Select 1
                From Scores As S2
                Where S2.ScoreStudentId = Scores.ScoresStudentId
                        And S2.ScoreAdvisor = Scores.ScoreAdvisor
                        And S2.ScoreCorrect = Scores.ScoreCorrect
                Group By S2.ScoreStudentId, S2.ScoreAdvisor, S2.ScoreCorrect
                Having Count(*) > 1
                    And Min(S2.PrimaryKeyColumn) = Scores.PrimaryKeyColumn
                )
    And Scores.ScoreTest = 3284
Thomas
Thanks Thomas! This works great. So this selects 1 from the table joined to itself, does the group by and the count greater than 1. So this would work with any number of duplicates and will always be left with 1 unique record? Can someone explain the 'MIN'?
Mikecancook
@Mikecancook - RE: Min, we need to choose one of the duplicates to remove. In this case, I'm choosing the duplicate with the lowest PK number. I could have just as easily used Max and removed the duplicate with the highest PK.
Thomas
I must be missing something because I still can't get this to work. It just says 0 rows affected. I'll edit my post to reflect my changes.
Mikecancook
@Mikecancook - If you change the `Delete Scores` part of the query to `Select From Scores` and run it (given what you said, you should get no rows). Now comment the `And Min(S2.PrimaryKeyColumn) = Scores.PrimaryKeyColumn` and see if you get anything. If not, then the question is whether there are still duplicates on those three columns.
Thomas
@Thomas - Ok, so I think I figured out what the problem is. Apparently, NewScoreID is a varchar datatype and not the INT PK I thought it was. I messed up the table moving it from a production environment to a testing environment. As usual, it was pilot error.
Mikecancook