views:

56

answers:

2

I have this query

SELECT  id, timeOrdered, order_desc, firstname, lastname
FROM `db`.`myTable` 
WHERE `myTable`.`customer_number` IN (100, 101, 102, 103, 104, 105)

I am trying to find which are duplicate records. If the timeOrdered, order_desc, firstname, lastname are same then its dupe.

What would be the query for that
Thanks

+2  A: 
SELECT DISTINCT
    t1.id
FROM 
    mytable t1
    INNER JOIN mytable t2 ON 
        t1.timeordered = t2.timeordered AND
        t1.order_desc = t2.order_desc AND
        t1.firstname = t2.firstname AND
        t1.lastname = t2.lastname AND
        t1.id <> t2.id
Andy Evans
This query would generate duplicates in the result set, if for example there are 3 rows that are equal. A distinct 'could' help. However, this solution is using joins for comparison, that is quicker most of time. +1 :)
SDReyes
@SDReyes: The duplication is a reality of JOINing; EXISTS can be faster when dealing with 2+ matches, and not duplicate results. `IN` generally doesn't support tuples (but does on MySQL)
OMG Ponies
@OMG Ponnies: Yes. your solution takes 79 ms testing on SQL Server with ~100 000 rows. using joins it would take 1872ms. awesome : O. congrats OMG! :)
SDReyes
+8  A: 

To see what values have duplicates:

  SELECT t.order_desc, 
         t.firstname,
         t.lastname
    FROM db.mytable t
   WHERE t.customer_number IN (100, 101, 102, 103, 104, 105)
GROUP BY t.order_desc, t.firstname, t.lastname, t.timeordered
  HAVING COUNT(*) > 1

To see the entire record associated with those duplicates:

SELECT x.*
  FROM db.mytable x
  WHERE EXISTS(SELECT NULL
                 FROM db.mytable t
                WHERE t.customer_number IN (100, 101, 102, 103, 104, 105)
                  AND t.order_desc = x.order_desc
                  AND t.firstname = x.firstname
                  AND t.lastname = x.lastname
                  AND t.timeordered = x.timeordered
             GROUP BY t.order_desc, t.firstname, t.lastname, t.timeordered
               HAVING COUNT(*) > 1) 
 WHERE x.customer_number IN (100, 101, 102, 103, 104, 105)
OMG Ponies
Add the timeOrdered column to the comparison
SDReyes
@SDReyes: Corrected, thx.
OMG Ponies
Remove the where clause in the last line
SDReyes
79ms Awesome :D +1
SDReyes