views:

520

answers:

2
| one | two |
-------------
| A   | 1   |
| A   | 2   |
| B   | 1   |
| B   | 3   |
| C   | 1   |
| C   | 4   |

I would like to get no repeats in any column in a query, so the standard SELECT DISTINCT one,two FROM table; or SELECT * FROM table GROUP BY one,two; doesn't quite work, because it looks for distinct in all rows, which in this case would return all 6 rows.

Ideally, I am looking for:

| one | two |
-------------
| A   | 1   |
| B   | 3   |
| C   | 4   |

In PHP (etc.), I would just do this with an array for each column, and if any column has been used before then skip the row. I am not sure how to implement that in MySQL, though.

SELECT * FROM (SELECT * FROM table GROUP BY one) GROUP BY two- almost works. but because the outer query doesn't see all the alternatives, it will miss valid options, i.e. the inner will collapse to A,B,C but could well pick all 1s for column two, which would mean the second GROUP BY would then collapse it own to 1 row!

I know the order of duplication checking will have an effect on the exact rows returned -- not worried about that -- I just want a good cross section of rows with minimal similar rows.

A: 

There's no way to do this in SQL: you can have six rows (every unique tuple), five rows (every first use of each column value) or one row (every first use of each value in each column which appears in both columns).

The reason you're having such a difficult time explaining what you want is that it's based on a human judgement call. You won't be able to do this in SQL until you're able to describe it qualitatively in English, and what you want isn't qualitative, it's procedural.

There are a bunch of ways to approximate it, such as grouping by the lesser column then sorting by match count inverse, but they're all exploitable.

Until you can give an unambiguous, logic driven criterion for selection, this will not succeed. Saying "minimal" doesn't count until you define minimal, and the minimal you appear to want requires procedural aggregate behvaior, which you cannot get in MySQL.

Probably the closest you'll get which doesn't incorrectly exclude rows is select distinct * from (select * from foo group by one) as l union all (select * from foo group by two) as r;
Thanks for the reply, admit don't understand the difference between qualitative and procedural - will attempt to read up on those terms...
barryhunter
A: 

Well as it turns out I found an answer ;)

CREATE TEMPORARY TABLE table2 ENGINE HEAP SELECT * FROM table;

ALTER IGNORE TABLE table2 ADD UNIQUE (one), ADD UNIQUE (two);

SELECT * FROM table2;

The IGNORE in the alter table is important, as it simply discards any duplicate rows based on the unique indexe**s**.

(not sure why didn't think of this before - as used it to good effect in solving "order before group by" style queries!)

barryhunter
or course in the real query have a WHERE and ORDER BY on the initial select, which makes it useful. Experimenting with different order by's, RAND() works well.
barryhunter