views:

18

answers:

1

I have a query that I use for finding duplicate data. I have found that lately the query is very slow, and only getting slower. This is the query I am using (columns renamed):

  SELECT col1, 
         COUNT(col1) AS Counter 
    FROM people 
GROUP BY col1 
  HAVING (Counter > 1)

I have indexed col1 (which is a varchar(500)), but the query takes an epic amount of time to execute. Is there a better way to handle this, or am I stuck?

+2  A: 

Try this:

SELECT  *
FROM    people po
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    people pi
        WHERE   pi.col1 = po.col1
        LIMIT 1, 1
        )

This will return you each duplicated instance.

Quassnoi
This works perfectly, and it's significantly faster. Thanks!
Jon Tackabury
That's very clever!
David M
Just to let you know, the query went from 90 seconds down to 4 seconds with this change. Thanks!
Jon Tackabury
@JonT: glad to hear that, my pleasure :)
Quassnoi