views:

1080

answers:

4

Hi all, I have the following table:

    id       time      text      otheridentifier
    -------------------------------------------
    1        6         apple     4
    2        7         orange    4
    3        8         banana    3
    4        9         pear      3
    5        10        grape     2

What I want to do is select the 3 most recent records (by time desc), whose otheridentifiers are distinct. So in this case, the result would be id's: 5, 4, and 2.

id = 3 would be skipped because there's a more recent record with the same otheridentifier field.

Here's what I tried to do:

SELECT * FROM `table` GROUP BY (`otheridentifier`) ORDER BY `time` DESC LIMIT 3

However, I end up getting rows of id = 5, 3, and 1 instead of 5, 4, 2 as expected.

Can someone tell me why this query wouldn't return what I expected? I tried changing the ORDER BY to ASC but this simply rearranges the returned rows to 1, 3, 5.

Thanks for your help!

+5  A: 

It doesn't return what you expect because grouping happens before ordering, as reflected by the position of the clauses in the SQL statement. You're unfortunately going to have to get fancier to get the rows you want. Try this:

SELECT *
FROM `table`
WHERE `id` = (
    SELECT `id`
    FROM `table` as `alt`
    WHERE `alt`.`otheridentifier` = `table`.`otheridentifier`
    ORDER BY `time` DESC
    LIMIT 1
)
ORDER BY `time` DESC
LIMIT 3
chaos
I remembered the time I spent hours to fix sql like this and turns out mysql 4.0 doesn't support nested queries ;p
Unreality
@Unreality: Fortunately most solutions involving subqueries can be expressed as joins if necessary. :)
Rytmis
Yeah, but ones with ORDER/LIMIT are _not_ easily expressed by JOIN...
Alex Martelli
Wow, that is fancy. Where do I learn to do things like that? Thanks!
Jasie
As much as possible, avoid subqueries because they are slow. Use LEFT JOIN instead :)
marknt15
@Jasie: http://dev.mysql.com/doc/ :) @marknt115: Yes, avoid them as much as possible, but only that much.
chaos
+1  A: 
SELECT * FROM table t1 
WHERE t1.time = 
    (SELECT MAX(time) FROM table t2 
     WHERE t2.otheridentifier = t1.otheridentifier)
Rytmis
How would this select the latest row per otheridentifier?
Andomar
@Andomar: I shouldn't try answering questions when I'm not fully awake. Changed the column names a bit -- see if it makes more sense now. :)
Rytmis
+2  A: 

You could join the table on itself to filter the last entry per otheridentifier, and then take the top 3 rows of that:

SELECT last.*
FROM `table` last
LEFT JOIN prev 
    ON prev.`otheridentifier` = last.`otheridentifier`
    AND prev.`time` < last.`time`
WHERE prev.`id` is null
ORDER BY last.`time` DESC 
LIMIT 3
Andomar
+1  A: 

Andomar's answer is probably best in that it doesn't use a subquery.

An alternative approach:

select *
from   table t1
where  t1.time in (
                  select   max(s2.time)
                  from     table t2
                  group by t2.otheridentifier
                  )

Adam Bernier
I think I see a problem here if the time values aren't unique -- this might return rows that it shouldn't. Suppose there's a time value that is the maximum for one otheridentifier, but is, say, second largest for another otheridentifier. Wouldn't this query then return both otheridentifiers? I may be altogether off though, I'm still a bit tired. :)
Rytmis
@Rytmis: Yeah, and so would my query, and yours:) hehe
Andomar
@Andomar: Hmm, are you sure about my query though? Because I just tested it by adding a row (6, 7, 'strawberry', 3) -- the time value 7 is the largest in the group that has otheridentifier 4, but the second largest in the one that has otheridentifier 3. My query still only returns the rows the OP wanted. Is my test case wrong? :)
Rytmis
@Andomar: Nope, the test case is correct -- this query returns the 'strawberry' row while mine doesn't.
Rytmis
@Rytmis: try it with (6,7,Strawberry,4)
Andomar