ansaurus

Question

How do I extract duplicate records from a table using multiple fields?

Answer 1

+2 A:

WITH q AS (
          SELECT Company.*,
                 ROW_NUMBER() OVER (PARTITION BY CompanyState, CompanyCity, CompanyName ORDER BY CompanyID) AS rnName,
                 ROW_NUMBER() OVER (PARTITION BY CompanyState, CompanyCity, CompanyAddress ORDER BY CompanyID) AS rnAddress
          FROM Company
          )
SELECT    *
WHERE     rnName > 1 OR rnAddress > 1

Note, though, that if your data will look like this:

CompanyID    CompanyName  CompanyAddress 
---------    -----------  --------------
1            McDonalds    Avenue 1
2            McDonalds    Avenue 2
3            Starbucks    Avenue 2

, then both records 2 and 3 will be deleted (which is what you requested but probably not what you wanted)

If you just want to list all rows having duplicates, then issue:

SELECT  *
FROM    Company co
WHERE   EXISTS
        (
        SELECT  1
        FROM    Company cn
        WHERE   cn.CompanyState = co.CompanyState
                AND cn.CompanyCity = co.CompanyCity
                AND cn.CompanyName = co.CompanyName
                AND cn.CompanyID <> co.CompanyID
        )
        OR EXISTS
        (
        SELECT  1
        FROM    Company ca
        WHERE   ca.CompanyState = co.CompanyState
                AND ca.CompanyCity = co.CompanyCity
                AND ca.CompanyAddress = co.CompanyAddress
                AND ca.CompanyID <> co.CompanyID
        )

This will work in SQL Server 2000 too.

Having indexes on (CompanyState, CompanyCity, CompanyName) and (CompanyState, CompanyCity, CompanyAddress) will greatly improve this query.

Quassnoi 2009-04-03 15:53:49

I want all the duplicates listed, they're not going to be put straight into a deletion queue, so list all the dups is what I'm after. We have to merge invoices across from duplicates to the master record before all the dups are removed.

BenAlabaster 2009-04-03 16:04:39

Should this work in SQL 2000 Server? It gives me the following errors:Msg 156, Level 15, State 1, Line 1Incorrect syntax near the keyword 'WITH'.Msg 195, Level 15, State 10, Line 3'ROW_NUMBER' is not a recognized function name.

BenAlabaster 2009-04-03 16:07:23

No, this will not work in SQL Server 2000

Quassnoi 2009-04-03 16:11:27

Perfect, your second response proved far more successful :)

BenAlabaster 2009-04-03 16:41:54

Answer 2

A:

SELECT
     C1.CompanyID,
     C2.CompanyID
FROM
     Company C1
INNER JOIN Company C2 ON
     (C2.CompanyName = C1.CompanyName OR C2.CompanyAddress = C1.CompanyAddress) AND
     C2.CompanyCity = C1.CompanyCity AND
     C2.CompanyState = C2.CompanyState AND
     C2.CompanyID > C1.CompanyID

If you have three or more matches then they will appear multiple times in the list. There are various ways to handle that depending on what exactly you want to get back from the query.

I would also strongly suggest that you look into better front-end coding to restrict how addresses are getting into your system as well as user training.

Tom H. 2009-04-03 15:55:54

Hehe, yes, I was hired to assist with the better front end coding - I'm on it, but I'm also in the process of cleaning up the 5 years of mayhem going on before my arrival...

BenAlabaster 2009-04-03 16:08:55

Okay, any suggestions for getting back only single instances of duplicates? I guess I could parse them with a cursor, but is there a query based way of doing that?

BenAlabaster 2009-04-03 16:26:00

What do you define as a single instance and what would you want to see? For example, if you have company A and B that match on name, city, state and company C that matches A on address, city, state how would you want to see that?

Tom H. 2009-04-03 18:14:13

That's a good question I suppose... I would consider that if A matches B on name, city and C matches A on address, city and state that each of these records should be displayed only a single time such that all 3 are considered "duplicate".

BenAlabaster 2009-04-03 20:32:07

BenAlabaster 2009-04-03 20:33:44

Answer 3

A:

Try something like this...

Select field1, field2, ... etc, count(*)
FROM Company,
GROUP BY field1, field2, ...
HAVING count(*) > 1

This should show you what is duplicated.

Eppz 2009-04-03 15:55:59

+1 Yep, this is totally the answer I was going to give.

Neil Barnwell 2009-04-03 15:59:51

How will it show duplicates on (CompanyName OR CompanyAddress)?

Quassnoi 2009-04-03 16:03:19

But wouldn't account for the *OR* situation... it only accounts if all the fields in the group by match

BenAlabaster 2009-04-03 16:03:22

Easy enough to add a where clause to meet the conditions.

Eppz 2009-04-03 17:09:38

ansaurus

tags:

views:

answers:

How do I extract duplicate records from a table using multiple fields?

related questions