views:

90

answers:

2

Hi all, can somebody help me to write a sql select to perform a task. So the problem is that we have a table and there are some duplicates, so i need to find where name, street and house are the same and to group somehow them.

I have almost this situation but the difference is that i would like to group them to find what is duplicate of what.

Thanks in advance.

+6  A: 

This is assuming you have an id field, which will be grouped with the GROUP_CONCAT() function for each duplicate row:

SELECT    t1.name, t1.street, t1.house, GROUP_CONCAT(DISTINCT t1.id) dupes
FROM      your_table t1
JOIN      your_table t2 ON (t2.name = t1.name AND 
                            t2.street = t1.street AND 
                            t2.house = t1.house)
GROUP BY  t1.name, t1.street, t1.house
HAVING    COUNT(*) > 1;

Test case:

CREATE TABLE your_table (
   id int, 
   name varchar(10), 
   street varchar(10), 
   house varchar(10)
);

INSERT INTO your_table VALUES (1, 'a', 'b', 'c');
INSERT INTO your_table VALUES (2, 'a', '1', 'c');
INSERT INTO your_table VALUES (3, 'a', '2', '3');
INSERT INTO your_table VALUES (4, 'a', 'b', 'c');
INSERT INTO your_table VALUES (5, 'a', 'b', 'c');
INSERT INTO your_table VALUES (6, 'c', 'd', 'e');
INSERT INTO your_table VALUES (7, 'c', 'd', 'e');

Result:

+------+--------+-------+-------+
| name | street | house | dupes |
+------+--------+-------+-------+
| a    | b      | c     | 1,5,4 |
| c    | d      | e     | 6,7   |
+------+--------+-------+-------+
2 rows in set (0.03 sec)
Daniel Vassallo
Perfect, thanks Daniel. If you don't mind I would ask how to make here possible to get also other columns which are not duplicates, cause I need them to show in a grid?..I would guess we have to change again query..
Centurion
@Vadim: It's not possible to show other columns in this format, unless you use `GROUP_CONCAT()` again on these columns. Imagine in the above example that we had another column called `value` and row 1 had a `value` of 100, row 2 a `value` of 200, etc. Now, since we grouped our final result set to show just the rows that have duplicates, what should be in the `value` column? 100, 400 or 500? (Because all those 3 records are grouped in one row)... However we could use `GROUP_CONCAT(value)` to have this column as a comma separated field just like the `dupes` column.
Daniel Vassallo
@Vadim: ... On the other hand, you may want to try @wimvds' solution below which should show all duplicate records in seperate rows.
Daniel Vassallo
@Daniel Vassallo Yes, I got it , thanks again.
Centurion
+1  A: 

To get duplicates, just use a self-join on the table :

select t1.id, t2.id, t1.name, t1.street, t1.house
from table t1
inner join table t2 on t1.name=t2.name and t1.street=t2.street and t1.house=t2.house
where t1.id < t2.id

The t1.id < t2.id will make sure every duplicate will only appear once.

wimvds
+1 for solution , but I need to make possible to filter them, and the client will delete them so I have to show all of them.
Centurion