views:

234

answers:

3

I wrote some code to select duplicates and group them using first and last names. I gather them into a multidimensional array and dedupe/merge them using jQuery/Ajax on the resulting page. I would like to ask if there is a better method of creating the array than how I'm doing it. Here is my code. Thank you.

$dataArr=fetchDups($conn, 13, 5); // get a few at a time

print '<div style="clear:both;"></div><pre>';
print_r($dataArr);
print '</pre><div style="clear:both;"></div>'; 


    function fetchDups($conn, $client_id, $limit='')
    {

     $sql=' SELECT * FROM `contacts` WHERE `clientid`=\'13\' GROUP BY fname, lname ';
     //$sql=' SELECT DISTICT fname, lname, * FROM `clients` WWHERE `clientid`=\'13\' ';
     $res=mysql_query($sql, $conn)or die(mysql_error());
     $contactsRow=array();
     while($row=mysql_fetch_array($res)){
      echo $row['fname'].'<br>';
      $contactsRow[]=$row;
     }

  mysql_freeresult($res);

  $dataArr=array();
  $i=0;  
  $limitNum=0;
  //----------------------------------  

  foreach($contactsRow AS $rowNew){

   $sql=' SELECT * FROM `contacts` WHERE `clientid`=\'13\' AND `id`!=\''.$rowNew['id'].'\'  
    AND (`fname` = \''.$rowNew['fname'].'\' OR `lname` = \''.$rowNew['lname'].'\')
   ';
   //echo $sql;
   $res=mysql_query($sql, $conn)or die(mysql_error());
   $rowCountDup=mysql_num_rows($res);

   if($rowCountDup>0){

    $d=0;
    $dataArr[$i]=array();
    $dataArr[$i][$d]=$rowNew;
    while($rowNew=mysql_fetch_array($res)){
     $dataArr[$i][($d+1)]=$rowNew;
     $d++;    
    }

    $i++;
    $limitNum++;    

   }

   // limit the results. too many crashes the browser
   if($limitNum==$limit){
    break;
   }   

  }

  mysql_freeresult($res);

  return $dataArr;

    }
A: 

If you just want to avoid displaying duplicates and not actually removing them from your db, use DISTINCT SQL keyword.

Soufiane Hassou
just noticed that the query is there but he has commented it out.
Shiv
A: 

For this kind of things, you should probably try using:

SELECT * FROM contacts refC JOIN contacts allC USING (fname, lname) WHERE refC.clientid='13'

This does a self-join on contacts based on first and last name, so allC aliases to the list of all contacts that share refC's first and last names (including himself).

This way, you get all the information you're looking for in only one SQL query. Tuning may be achieved on the query by adding an index on columns fname and lname of table contacts, so the join doesn't have to parse the whole table to match.

--edit: You may get to specify more finely how you join your tables as for instance:

SELECT *
FROM contacts refC
JOIN contacts allC ON (allC.fname LIKE CONCAT(refC.fname, '%') AND allC.lname LIKE CONCAT(refC.lname, '%'))
WHERE refC.clientid='13'

Which is strictly equivalent (but IMO easier to read than) to:

SELECT *
FROM contacts refC,contacts allC
WHERE allC.fname LIKE CONCAT(refC.fname, '%') 
AND allC.lname LIKE CONCAT(refC.lname, '%')
AND refC.clientid='13'
Romain
Forgot to mention... There are many reasons why it's not adviseable to `SELECT *`, my favourite is that it induces too much coupling between the application logic and the database structure (ordering of columns becomes important for the code, whereas it shouldn't).
Romain
@Romain: "ordering of columns becomes important for the code"... Really? Only if you write your code to rely on the order, surely. Whether you get it as an associative array or as an object, the order is neither here nor there. Select * is bad only because it potentially retrieves unneeded data, IMO
Flubba
I do agree. But if you place yourself in the head of a DBA and/or expect your queries to get re-used by other people, you might want to consider these people may not be so religious on GPP :)
Romain
In my situation, I need all the data because When I display the data on the page as groups of duplicates, I'm using some Jquery/Ajax to allow the user to pick which data they want to keep from each contact to create a new contact with all the correct information and save it as a new contact, then delete all the other dups. Also, some data I need to search for is address, and phone. I need to use LIKE% because some people might type things in slightly differently.
EricP
I adjusted my answer to fit your needs there. You may change the `ON` clause to fit whatever matching you feel is appropriate.
Romain
Thank you all. I can almost see how I can use your query Romain, but then I have to separate and group the dups somehow in php. And I don't see how it can be done. I need to select distinct names and query the table to create an array for each name if there are any dups. My code is working ok right now after I adjusted it, but I thought there might be a better way.
EricP
If you order your resultet using an ORDER BY clause based on names, you can iterate over it in PHP and you'll know you change "unique" user when either of the names doesn't match the one of the preceding row... Easy!
Romain
A: 

Or you could try something like the second query here which uses a derived table:

mysql> select * from contacts ;
+----+--------+---------+
| id | fname  | lname   |
+----+--------+---------+
| 1  | Annie  | Haddock |
| 2  | Annie  | Haddock |
| 3  | Ginger | Mole    |
| 4  | Ted    | Ted     |
| 5  | Ted    | Ted     |
+----+--------+---------+
5 rows in set (0.01 sec)

mysql> select id, fname, lname, total from 
    (select *, count(*) as total 
     from contacts group by fname, lname) people 
       where total > 1;
+-----------+--------------+--------------+--------------+
| people.id | people.fname | people.lname | people.total |
+-----------+--------------+--------------+--------------+
| 1         | Annie        | Haddock      | 2            |
| 4         | Ted          | Ted          | 2            |
+-----------+--------------+--------------+--------------+
2 rows in set (0.01 sec)

then just iterate through it with foreach. Note that "people" above is an alias for the derived table created by the inner select

Flubba