views:

51

answers:

3

Suppose you had the mySQL table describing if you can mix two substances

Product   A    B    C
---------------------
A         y    n    y
B         n    y    y
C         y    y    y

The first step would be to transform it like

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

But then you have duplicate information. (eg. If A can mix with B, then B can mix with A), so, you can remove several rows to get

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    B    y
B    C    n
C    C    y

While the last step was pretty easy with a small table, doing it manually would take forever on a larger table. How would one go about automating the removal of rows with duplicate MEANING, but not identical content?

Thanks, I hope my question makes sense as I am still learning databases

+3  A: 

If it's safe to assume that you're starting with all relationships doubled up, e.g.

If A B is in the table, then B A is guaranteed to be in the table.

Then all you have to do is remove all rows where P2 < P1;

DELETE FROM `table_name` WHERE `P2` < `P1`;

If this isn't the case, you can make it the case by going through the table and inserting all the duplicate rows if they don't already exist, then running this.

Jamie Wong
I'm glad you found my answer helpful - but I would recommend waiting longer for responses before accepting. It's quite possible that there's a much better solution than this, or there's a critical flaw in mine.
Jamie Wong
good point. i will wait.
thomas
good answer -- as long as the assumption that all relationships are doubled holds true. If you can't be sure the assumption will hold, though, you should check for duplicates before deleting.
dave
All relationships are doubled, I generated the table with a nested for loop.Thanks again
thomas
+1  A: 

Step 1 (as you've already done): Transform to Table2

P1   P2   ?
-----------
A    A    y
A    B    n
A    C    y
B    A    y
B    B    y
B    C    n
C    A    y
C    B    n
C    C    y

Step 2: ReOrder Columns, Select Distinct

SELECT DISTINCT
   IF P1<P2 THEN P1 ELSE P2 END as P1, -- this puts the smallest value in P1
   IF P1>P2 THEN P1 ELSE P2 END as P2 -- this puts the largest value in P2
FROM Table2
WHERE NOT P1=P2  --(Assuming records like A, A, y are not interesting)

I'm not a mySQL guy, so you might need to check the if/then syntax, but this seems conceptually ok anyway.

dave
dave
I will and thanks. (internship)
thomas
+2  A: 

I don't think it's necessary in your situation, but as an intellectual exercise, you could build on Jamie Wong's solution and prevent non-duplicated columns from being removed with an EXISTS clause. Something like this:

DELETE FROM `table_name` AS t1
  WHERE `P2` < `P1`
    AND EXISTS (SELECT NULL FROM `table_name` AS t2
      WHERE t1.`P1` = t2.`P2` AND t1.`P2` = t2.`P1`);

It pretty much just makes sure that there's a duplicate before deleting anything.

(My MySQL syntax might be a little off; it's been a while.)

WCWedin
good point. If you use the delete approach, you should check for duplicates before deleting. You can avoid deleting and the duplicate check altogether, though (see my answer above).
dave
@dave Probably shouldn't say "above". Unless you're that certain it will receive more upvotes :P
Jamie Wong
I will take this into account. Thanks
thomas
haha -- yep -- definitely not above anymore :-)
dave