ansaurus

Question

Select duplicates with PHP & MySql for merging process

Answer 1

A:

If you just want to avoid displaying duplicates and not actually removing them from your db, use DISTINCT SQL keyword.

Soufiane Hassou 2009-12-03 00:47:14

just noticed that the query is there but he has commented it out.

Shiv 2009-12-03 02:10:39

Answer 2

A:

For this kind of things, you should probably try using:

SELECT * FROM contacts refC JOIN contacts allC USING (fname, lname) WHERE refC.clientid='13'

This does a self-join on contacts based on first and last name, so allC aliases to the list of all contacts that share refC's first and last names (including himself).

This way, you get all the information you're looking for in only one SQL query. Tuning may be achieved on the query by adding an index on columns fname and lname of table contacts, so the join doesn't have to parse the whole table to match.

--edit: You may get to specify more finely how you join your tables as for instance:

SELECT *
FROM contacts refC
JOIN contacts allC ON (allC.fname LIKE CONCAT(refC.fname, '%') AND allC.lname LIKE CONCAT(refC.lname, '%'))
WHERE refC.clientid='13'

Which is strictly equivalent (but IMO easier to read than) to:

SELECT *
FROM contacts refC,contacts allC
WHERE allC.fname LIKE CONCAT(refC.fname, '%') 
AND allC.lname LIKE CONCAT(refC.lname, '%')
AND refC.clientid='13'

Romain 2009-12-03 08:44:43

Forgot to mention... There are many reasons why it's not adviseable to `SELECT *`, my favourite is that it induces too much coupling between the application logic and the database structure (ordering of columns becomes important for the code, whereas it shouldn't).

Romain 2009-12-03 08:54:35

@Romain: "ordering of columns becomes important for the code"... Really? Only if you write your code to rely on the order, surely. Whether you get it as an associative array or as an object, the order is neither here nor there. Select * is bad only because it potentially retrieves unneeded data, IMO

Flubba 2009-12-03 09:21:24

I do agree. But if you place yourself in the head of a DBA and/or expect your queries to get re-used by other people, you might want to consider these people may not be so religious on GPP :)

Romain 2009-12-03 09:36:56

In my situation, I need all the data because When I display the data on the page as groups of duplicates, I'm using some Jquery/Ajax to allow the user to pick which data they want to keep from each contact to create a new contact with all the correct information and save it as a new contact, then delete all the other dups. Also, some data I need to search for is address, and phone. I need to use LIKE% because some people might type things in slightly differently.

EricP 2009-12-03 13:10:38

I adjusted my answer to fit your needs there. You may change the `ON` clause to fit whatever matching you feel is appropriate.

Romain 2009-12-03 13:41:55

Thank you all. I can almost see how I can use your query Romain, but then I have to separate and group the dups somehow in php. And I don't see how it can be done. I need to select distinct names and query the table to create an array for each name if there are any dups. My code is working ok right now after I adjusted it, but I thought there might be a better way.

EricP 2009-12-03 14:54:34

If you order your resultet using an ORDER BY clause based on names, you can iterate over it in PHP and you'll know you change "unique" user when either of the names doesn't match the one of the preceding row... Easy!

Romain 2009-12-03 17:42:19

Answer 3

A:

Or you could try something like the second query here which uses a derived table:

mysql> select * from contacts ;
+----+--------+---------+
| id | fname  | lname   |
+----+--------+---------+
| 1  | Annie  | Haddock |
| 2  | Annie  | Haddock |
| 3  | Ginger | Mole    |
| 4  | Ted    | Ted     |
| 5  | Ted    | Ted     |
+----+--------+---------+
5 rows in set (0.01 sec)

mysql> select id, fname, lname, total from 
    (select *, count(*) as total 
     from contacts group by fname, lname) people 
       where total > 1;
+-----------+--------------+--------------+--------------+
| people.id | people.fname | people.lname | people.total |
+-----------+--------------+--------------+--------------+
| 1         | Annie        | Haddock      | 2            |
| 4         | Ted          | Ted          | 2            |
+-----------+--------------+--------------+--------------+
2 rows in set (0.01 sec)

then just iterate through it with foreach. Note that "people" above is an alias for the derived table created by the inner select

Flubba 2009-12-03 09:17:00

ansaurus

tags:

views:

answers:

Select duplicates with PHP & MySql for merging process

related questions