views:

183

answers:

3

My table looks like this with duplicates in col1

col1,      col2,      col3,      col4
1,            1,            0,            a
1,            2,            1,            a
1,            3,            1,            a
2,            4,            1,            b
3,            5,            0,            c  

I want to select distinct col1 with max (col3) and min(col2); so result set will be:

col1,      col2,      col3,      col4
1,            2,            1,            a
2,            4,            1,            b
3,            5,            0,            c

I have a solution but looking for best ideas?

+3  A: 
SELECT col1, MAX(col3) AS col3, MIN(col2) AS col2, MAX(col4) AS col4
FROM MyTable
GROUP BY col1;

You showed in your example that you wanted a col4 included, but you didn't say which value you want. You have to put that column either in an aggregate function or in the GROUP BY clause. I assumed that taking the max for the group would be acceptable.


update: Thanks for the clarification. You're asking about a variation of the greatest-n-per-group problem that comes up frequently on Stack Overflow. Here's my usual solution:

SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t3
 ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
WHERE t3.col1 IS NULL;

In English: show me the row (t1) for which no row exists with the same col1 and a greater value in col3. Some people write this using a NOT EXISTS subquery predicate, but I prefer the JOIN syntax.

Here's the output from my test given your example data:

+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
|    1 |    2 |    1 | a    |
|    1 |    3 |    1 | a    |
|    2 |    4 |    1 | b    |
|    3 |    5 |    0 | c    |
+------+------+------+------+

Notice that there are two rows for col1 value 1, because both rows satisfy the join condition; no other row exists with a greater value in col3.

So we need to add another condition to resolve the tie. You want to compare to rows with a lesser value in col2 and if no such rows exist, then we've found the row with the least value in col2.

SELECT t1.*
FROM MyTable t1
LEFT OUTER JOIN MyTable t3
 ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
LEFT OUTER JOIN MyTable t2
 ON t1.col1 = t2.col1 AND t1.col3 = t2.col3 AND t1.col2 > t2.col2
WHERE t2.col1 IS NULL AND t3.col1 IS NULL; 

Here's the output from my test given your example data:

+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
|    1 |    2 |    1 | a    |
|    2 |    4 |    1 | b    |
|    3 |    5 |    0 | c    |
+------+------+------+------+

PS: By the way, it's customary on Stack Overflow to edit your original question and add detail, instead of adding answers to your own question that only clarify the question. But I know some actions aren't available to you until you have more than 1 reputation point.

Bill Karwin
Gets my vote for not having an unnecessary table alias, but you miss out on bonus points in not naming your column outputs :)
sidereal
@sidereal: Point taken. I have edited to add column aliases.
Bill Karwin
Thanks Bill! It works for me
A: 
The queries suggested won't fetch the desired result set I have mentioned above.

To be more clear for rows with duplicate col1 (and hence col 4) I want the row that has max col3 value(which will always have 1 or 0) and min of corresponding col2 values of the duplicate set.

So for the set below
col1,      col2,      col3,      col4 
1,            1,            0,            a 
**1,            2,            1,            a** 
1,            3,            1,            a 

I want row 2
col1,      col2,      col3,      col4 
1,            **2**,      1,       a 

and not
col1,      col2,      col3,      col4 
1,            **1**,      1,       a 

Hope I am understood.
A: 

Hello,

As a new requirement arises to get another column result from the table, so is it required to perform another self join on the table or is there any other way?

My table is this

id,      time,      Active,      Name  
1,           1,            0,            a  
**1,            2,            1,            a**  
1,            3,            1,            a 

I am using the query below to get the result. Is there any other better way? I mean any simpler!

select
id, min([TimeStamp]), Active from (
select
m.* from
SelfJoinTest m inner
join ( 
select Id, max(Active) as MaxActive 
from SelfJoinTest 
group by Id )
mm on m.Id = mm.Id and m.Active = mm.MaxActive)As T group by T.id, T.Active