views:

3292

answers:

8

I want to pull out duplicate records in a MySQL Database. This can be done with:

SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1

Which results in:

100 MAIN ST    2

I would like to pull it so that it shows each row that is a duplicate. Something like:

JIM    JONES    100 MAIN ST
JOHN   SMITH    100 MAIN ST

Any thoughts on how this can be done? I'm trying to avoid doing the first one then looking up the duplicates with a second query in the code.

A: 

Not going to be very efficient, but it should work:

SELECT *
FROM list AS outer
WHERE (SELECT COUNT(*)
        FROM list AS inner
        WHERE inner.address = outer.address) > 1;
Chad Birch
A: 

do a nested SELECT, the inner one picks all the duplicated ids, and the outer one gets all the records with those ids

Javier
+5  A: 

The key is to rewrite this query so that it can be used as a subquery.

SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
R. Bemrose
Be careful with sub-queries. Sub-queries are/can be ridiculously bad for performance concerns. If this needs to happen often and/or with lots of duplicate records I would consider moving the processing out of the database and into a dataset.
bdwakefield
It's a uncorrelated subquery, so it shouldn't be too bad assuming either query alone isn't poorly designed.
ʞɔıu
@nick: It's still a valid point in general about subqueries.
R. Bemrose
+2  A: 

This will select duplicates in one table pass, no subqueries.

SELECT  *
FROM    (
        SELECT  ao.*, (@r := @r + 1) AS rn
        FROM    (
                SELECT  @_address := 'N'
                ) vars,
                (
                SELECT  *
                FROM
                        list a
                ORDER BY
                        address, id
                ) ao
        WHERE   CASE WHEN @_address <> address THEN @r := 0 ELSE 0 END IS NOT NULL
                AND (@_address := address ) IS NOT NULL
        ) aoo
WHERE   rn > 1

This query actially emulates ROW_NUMBER() present in Oracle and SQL Server

See the article in my blog for details:

Quassnoi
aghh... it would be easier to read that without any formatting at all
Javier
A: 

SELECT * FROM (SELECT address, COUNT(id) AS cnt FROM list GROUP BY address HAVING ( COUNT(id) > 1 ))

DJ
A: 

Why not just INNER JOIN the table with itself?

SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.adress
WHERE a.id <> b.id

A DISTINCT is needed if the address could exist more than two times.

rudolfson
A: 

SELECT firstname, lastname, address FROM list WHERE Address in (SELECT address FROM list GROUP BY address HAVING count(*) > 1)

Ryan Roper
A: 

See http://semaphorecorp.com/mpdd/mpdd.html for tricks and considerations regarding comparing addresses for equality.

joe snyder