ansaurus

Question

How to detect duplicate rows in a SQL Server table?

Answer 1

+6 A:

You can use "group by" on all columns and then count(*)>1

Guge 2008-11-20 20:28:13

this works well, just remember to exclude a synthetic PK if you have one

Steven A. Lowe 2008-11-20 20:37:19

Answer 2

+1 A:

To detect, just group by as Guge said.

select fieldA, fieldB, count(*) from table
group by fieldA, fieldB
having count(*) > 1

If you want to delete dupes... pseudo....

select distinct into a temp table
truncate original table
select temp table back into original table

With truncate you may run into problems if you have FK constraints, so be smart about dropping constraints and making sure you don't orphan records.

Aaron Palmer 2008-11-20 20:39:04

Answer 3

+1 A:

Try this

Select * From Table
Group By [List all fields in the Table here]
Having Count(*) > 1

Charles Bretana 2008-11-20 20:43:19

Answer 4

+2 A:

To show an example of what others have been describing:

SELECT
    Col1, -- All of the columns you want to dedupe on
    Col2, -- which is not neccesarily all of the columns
    Col3, -- in the table
    Col4,
    Col5,
    Col6,
    Col7,
    Col8,
    Col9,
    Col10
FROM
    MyTable
GROUP BY
    Col1,
    Col2,
    Col3,
    Col4,
    Col5,
    Col6,
    Col7,
    Col8,
    Col9,
    Col10
HAVING
    COUNT(*) > 1

knightpfhor 2008-11-20 20:44:07

Nailed it dude.. Thanks a ton..

Bajji 2008-11-24 12:17:08

Answer 5

+1 A:

In addition to the suggestions provided, I would then go to the effort of preventing duplicates in the future, rather than trying to locate them later.

This is done using unique indexes on columns (or groups of columns) that are supposed to be unique. Remember that data in the database can be modified from other locations other than through the specific app that you are working on, so it's best to define what is and isn't allowed in a table at the DB level.

Jason Lepack 2008-11-20 21:48:17

Answer 6

A:

Looks like everything uses group by with an explicit column list, which certainly is the right way to do it.

But if you're working with lots of small (row count) tables, is there a generic solution that may sacrifice efficiency?

I'm stuck with such a situation, where I need to run dupe scanning for a bunch of small reference tables that are frequently re-loaded from bcps. About half of them don't have constraints preventing dupes, which has caused... issues.

(I'm trying to get them to add the indices, but in the meantime...)

Thoughts?

Michael Wilson 2010-07-09 16:00:58

ansaurus

tags:

views:

answers:

How to detect duplicate rows in a SQL Server table?

related questions