I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?
Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name WHERE c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING count(*) > 1;
SELECT ColumnA, COUNT( * ) FROM Table GROUP BY ColumnA HAVING COUNT( * ) > 0
Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.
SELECT a.Key, b.Key, a.Col
FROM TableABC a, TableABC b
WHERE a.Col = b.Col
AND a.Key <> b.Key
The advantage of this approach over the above answer is it gives the Key.
SELECT *
FROM mytable mto
WHERE EXISTS
(
SELECT 1
FROM mytable mti
WHERE mti.varchar_column = mto.varchar_column
LIMIT 1, 1
)
This query returns complete records, not just distinct varchar_column
's.
This query doesn't use COUNT(*)
. If there are lots of duplicates, COUNT(*)
is expensive, and you don't need the whole COUNT(*)
, you just need to know if there are two rows with same value.
Having an index on varchar_column
will, of course, speed up this query greatly.