ansaurus

Question

Answer 1

+1 A:

You could find this out by grouping the Names together, and only listing those where there is more than one record:

SELECT OldTable.Name, COUNT(1) Duplicates
FROM OldTable
GROUP BY OldTable.Name
HAVING Duplicates > 1

Should output:

OldTable:
Name | Duplicates
-----+------------
A    | 2
C    | 2

Gus 2010-09-23 00:11:14

except that won't catch cases that have the same number of entries, but whose entries differ.

Apeiron 2010-09-23 15:34:26

Answer 2

+1 A:

Try:

SELECT OT1.Name Name1, OT2.Name Name2
FROM OldTable OT1
JOIN OldTable OT2 ON OT1.Name < OT2.Name AND 
                     OT1.State = OT2.State AND 
                     OT1.Strat = OT2.Strat
GROUP BY OT1.Name, OT2.Name 
HAVING COUNT(*) = (SELECT COUNT(*) FROM OldTable TC1 WHERE TC1.NAME = OT1.NAME) 
   AND COUNT(*) = (SELECT COUNT(*) FROM OldTable TC2 WHERE TC2.NAME = OT2.NAME)

Mark Bannister 2010-09-23 14:15:07

Very Nice Solution, I tested it with a few sets of data that I knew had duplicates and lo and behold it worked. One thing I don't understand is OT1.Name < OT2.Name vs OT1.Name != OT2.Name perhaps this is just a difference in DBMS?Now I just need to figure this out, I understand the join but not completely the HAVING section.

Apeiron 2010-09-23 15:51:49

@Apeiron, I was concerned with the possibility that their might be much more than two names with the same set of states and strats - `OT1.Name < OT2.Name` will return half as many rows as `OT1.Name != OT2.Name`. (So for example, if 4 different names all share the same set of states and strats, the former will return 6 rows [ab, ac, ad, bc, bd, cd] while the latter will return 12 rows [ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc].)

Mark Bannister 2010-09-23 16:11:27

The main part of the query finds the number of combinations that all pairs of names have in common, while each of the HAVING conditions compares this with the total number of combinations for each of the names in the pair.

Mark Bannister 2010-09-23 16:14:51

ansaurus

tags:

views:

answers:

SQL Finding multiple-line duplicates

related questions