ansaurus

Question

What's the SQL query to list all rows that have 2 column sub-rows as duplicates?

Answer 1

+4 A:

Join on yourself like this:

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3

If you're using postgresql, you can use the oid to make it return less duplicated results, like this:

SELECT a.col3, b.col3, a.col1, a.col2 
FROM tablename a, tablename b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3
  AND a.oid < b.oid

Jerub 2008-09-25 01:35:57

Answer 2

+2 A:

Don't have a database handy to test this, but I think it should work...

select
  *
from
  theTable
where
  col1 in
    (
    select
      col1
    from
      theTable
    group by
      col1||col2
    having
      count(col1||col2) > 1
    )

dacracot 2008-09-25 01:37:40

This fails on SQL Server because 'col1' isn't present in the GROUP BY clause. I'm pretty sure this will fail on most other SQL databases.

Craig Trader 2008-09-25 02:17:18

Answer 3

+2 A:

My naive attempt would be

select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;

but that would return all the rows twice. I'm not sure how you'd restrict it to just returning them once. Maybe if there was a primary key, you could add "and a.pkey < b.pkey".

Like I said, that's not elegant and there is probably a better way to to do this.

Paul Tomblin 2008-09-25 01:38:39

Answer 4

+5 A:

With the data you have listed, your query is not possible. The data on rows 5 & 6 is not distinct within itself.

Assuming that your table is named 'quux', if you start with something like this:

SELECT a.COL1, a.COL2, a.COL3 
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3
ORDER BY a.COL1, a.COL2

You'll end up with this answer:

 COL1   COL2   COL3
 ---------------------
 aa     111    blah_x
 aa     111    blah_j

That's because rows 5 & 6 have the same values for COL3. Any query that returns both rows 5 & 6 will also return duplicates of ALL of the rows in this dataset.

On the other hand, if you have a primary key (ID), then you can use this query instead:

SELECT a.COL1, a.COL2, a.COL3
FROM quux a, quux b
WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID
ORDER BY a.COL1, a.COL2

[Edited to simplify the WHERE clause]

And you'll get the results you want:

COL1   COL2   COL3
---------------------
aa     111    blah_x
aa     111    blah_j
bb     112    blah_d
bb     112    blah_d

I just tested this on SQL Server 2000, but you should see the same results on any modern SQL database.

blorgbeard proved me wrong -- good for him!

Craig Trader 2008-09-25 01:40:02

Answer 5

+6 A:

Does this work for you?

select t.* from table t
left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2
where c.count > 1

Blorgbeard 2008-09-25 01:40:43

This is a correct answer. I think mine will run a hair faster on a large database, but I'd leave that up to a DBA to decide.

Craig Trader 2008-09-25 02:24:57

Left join is not needed due to criteria on the right side.

David B 2008-09-25 02:32:05

Looks slower than a solution based on an analytic function to me.

David Aldridge 2008-09-25 13:16:31

Answer 6

+2 A:

Something like this should work:

SELECT a.COL1, a.COL2, a.COL3
FROM YourTable a
JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3

In general, the JOIN clause should include every column that you're considering to be part of a "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as it takes) to eliminate a row joining to itself (COL3, in this case).

Jonathan Schuster 2008-09-25 01:43:11

Answer 7

+2 A:

This is pretty similar to the self-join, except it will not have the duplicates.

select COL1,COL2,COL3
from theTable a
where exists (select 'x'
              from theTable b
              where a.col1=b.col1
              and   a.col2=b.col2
              and   a.col3<>b.col3)
order by col1,col2,col3

IK 2008-09-25 01:48:08

Answer 8

A:

select COL1,COL2,COL3

from table

group by COL1,COL2,COL3

having count(*)>1

2008-09-25 02:43:10

This does not work. Examine the blah_x row in the question to understand why.

David B 2008-09-27 03:08:25

Answer 9

A:

Forget joins -- use an analytic function:

select col1, col2, col3
from
(
select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2
from table
)
where rows_per_col1_col2 > 1

David Aldridge 2008-09-25 03:27:28

That only works if your database supports it. SQL Server 2005 does, and presumably Oracle does. SQL Server 2000 does not, nor does MySQL or PostgresQL.

Craig Trader 2008-09-25 22:13:48

Ah, something new to learn. Should there be another from clause in this?

David B 2008-09-27 03:10:50

Heh, yes of course. Thanks.

David Aldridge 2008-10-01 04:17:18

Answer 10

+1 A:

Here is how you find duplicates. Tested in oracle 10g with your data.

select * from tst where (col1, col2) in (select col1, col2 from tst group by col1, col2 having count(*) > 1)

Kyle Dyer 2008-10-01 04:46:39

ansaurus

tags:

views:

answers:

What's the SQL query to list all rows that have 2 column sub-rows as duplicates?

related questions