ansaurus

Question

How can you find the rows with equal columns?

Answer 1

+1 A:

select * from foo where a = b

Or am I missing something?

===

Update for clarity:

select * from 
foo as a
inner join foo as b
on a.a = b.a AND b.a = b.b
and a.id != b.id

++++++++++ After 3rd clarity edit:

select f1.id
FROM foo as f1
INNER JOIN foo as f2
ON f1.a = f2.a AND f1.b=f2.b AND f1.id != f2.id

But I'm shot, so check it yourself.

timdev 2009-09-17 04:55:55

updated question since it wasn't clear

Paul Tarjan 2009-09-17 04:58:06

Answer 2

A:

shouldn't this work?

SELECT * FROM foo WHERE a = b

=== edit ===

the how about

SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1

=== final re-edit before i give up on this question ===

SELECT foo.* FROM foo, (
   SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1
) foo2
WHERE foo.a = foo2.a AND foo.b = foo2.b

Lukman 2009-09-17 04:56:57

updated question since it wasn't clear

Paul Tarjan 2009-09-17 04:58:36

Answer 3

+1 A:

SELECT * 
FROM foo first
JOIN foo second
  ON ( first.a = second.a
       AND first.b = second.b ) 
  AND (first.id <> second.id )

Should come up with all the rows where more that one row has the same combination of a and b.

Just hope you have an index on columns a and b.

James Anderson 2009-09-17 05:02:35

Paul - not to sound like a complete reputation hooker, but why accept an answer that doesn't actually answer what you stated was your ultimate goal? :)

DVK 2009-09-17 06:11:20

i sense this query will have a lots of duplicates. not a good query ...

Lukman 2009-09-17 06:33:32

Just change last predicate to (first.id > second.id) will get rid of the duplicates. This was answered before the OP clarified what the really wanted so I left it simple.

James Anderson 2009-09-18 03:31:10

Answer 4

+1 A:

Could you please clarify what you need to do ultimately? The best solution may depend on that (e.g., do you simply want to delete all dupliucate-key rows?)

One way is to handle this table (not sure if mySQL supports it, it's from SYBASE) if all you want is unique-keyed rows:

SELECT MIN(id), A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1

Your exact question (although I'm a bit at a loss as to why you'd need all rows except id=2) is:

SELECT F1.*  
FROM FOO F1 , 
     (SELECT A, B FROM FOO GROUP BY A, B HAVING COUNT(*)>1) F2
WHERE F1.A=F2.A and F1.B=F2.B

To delete all the duplicates, you can for example do

DELETE FOO WHERE NOT EXISTS
(SELECT 1 from
    (SELECT MIN(id) 'min_id' FROM FOO GROUP BY A, B HAVING COUNT(*)>1) UINIQUE_IDS 
 WHERE id = min_id)

As an alternative, you can do

  SELECT MIN(id) 'id', A, B INTO TEMPDB..NEW_TABLE 
  FROM FOO GROUP BY A, B HAVING COUNT(*)>1

  TRUNCATE TABLE FOO
  // Drop indices on FOO
  INSERT FOO SELECT * FROM NEW_TABLE
  // Recreate indices on FOO

DVK 2009-09-17 05:13:44

My ultimate goal is to remove all the duplicate rows so I can add the UNIQUE constraint.

Paul Tarjan 2009-09-17 05:25:40

@DVK Sadly your query didn't return within 15 minutes on my database so I couldn't evaluate whether it worked. It is a MyISAM table and locked the whole thing up so I didn't want to keep the site down for much longer than 15 mins. The accepted one I can do OFFSET and LIMIT on to chunk the request. I actually combined your solutions to do the temp table using the accepted answer, but I don't have enough rep to edit the answer.

Paul Tarjan 2009-09-17 06:37:40

Answer 5

A:

here's another approach

select * from foo f1 where exists(
  select * from foo f2 where
    f1.id != f2.id and
    f1.a = f2.a and
    f1.b = f2.b )

anyway, even though I find it a bit more readable, if you have such a huge table, you should check the execution plan, subqueries have a bad reputation involving performance...

you should also consider creating the index (without the unique clause, obviously) to speed up the query... for huge operations, sometimes it's better to spend the time creating the index, perform the update and then drop the index... in this case, I guess an index on (a, b) should certainly help a lot...

opensas 2009-09-17 05:15:57

Answer 6

A:

Try this:

    With s as (Select a,b from foo group by a,b having Count(1)>1)
Select foo.* from foo,s where foo.a=s.a and foo.b=s.b

This query should show duplicate rows in the table foo.

Himadri 2009-09-17 05:29:05

That would work on DB2, SQL Server 2005+, or Oracle 9i+ - but sadly, not MySQL.

OMG Ponies 2009-09-17 05:31:20

Yes. I have written this query in sql server 2005.

Himadri 2009-09-17 06:15:39

Answer 7

A:

Your stated goal is to remove all duplicate combination of (a,b). For that, you can use a multi-table DELETE:

DELETE t1
  FROM foo t1
  JOIN foo t2 USING (a, b)
 WHERE t2.id > t1.id

Before you run it, you can check which rows will be removed with:

SELECT DISTINCT t1.id
  FROM foo t1
  JOIN foo t2 USING (a, b)
 WHERE t2.id > t1.id

The WHERE clause being t2.id > t1.id it will remove all but the one with the highest value for id. In your case, only the rows with id equal to 2, 5 or 6 would remain.

Josh Davis 2009-09-17 12:40:14

Answer 8

A:

If the id value doesn't matter at all in the final product, that is, if you could renumber them all and it would be fine, and if id is a serial column, then just "select distinct" on the two columns into a new table, delete all the data from the old table, and then copy the temporary values back in.

Kev 2009-09-17 12:47:07

ansaurus

tags:

views:

answers:

How can you find the rows with equal columns?

related questions