tags:

views:

29

answers:

4

Hi All,

I have a table defined as:

create table Foo (
    X integer,
    Y timestamp,
    Z varchar(255),
    primary key (X, Y)
);

For a given X I want to find the most recent row. I've come up with the following so far, but would like to know if anyone knows something that will perform better or is just "interesting" such as using a left join on itself (which I haven't been able to get to work).

select * from Foo where X=1234 order by Y desc limit 0,1;
select * from Foo where X=1234 and Y in  (select max(Y) from Foo where X=1234);

Thanks!

+2  A: 

Your second solution will work but the performance might not be optimal and it could return more than one row. You should also change IN to =:

SELECT * FROM Foo
WHERE X = 1234 AND Y = (SELECT MAX(Y) FROM Foo WHERE X = 1234)

I think the first soluton is better:

SELECT *
FROM Foo
WHERE X = 1234
ORDER BY Y DESC
LIMIT 1

Add an index on X, Y to get decent performance.

Mark Byers
+2  A: 

select most recent row for each user (which had events)

    select * from events e
    join (select UserId, Max(time) time from events group by userId) t
    ON (e.userId = t.userid and e.time = t.time)
Michael Pakhantsov
I don't think this works correctly. The 'select userId, max(time)' could return a value that does not correspond to a primary key in the table, i.e. it could give user id 1, with the max time for user 2.
Scruffers
@Scruffers, How 'primary key' and 'group by' related here?? Does not matter 'time' field is part of primary key or not.
Michael Pakhantsov
I think I read your solution before you edited it to include 'group by'. I think that works now. Thx.
Scruffers
+1  A: 

In your second example you're using a correlated subquery. You could use a non-correlated subquery (derived table) instead. This will be faster if you happen to list all the foos, because a correlated subquery would have to be called once for each row of the outer query.

This is an example using a derived table:

SELECT  foo.*
FROM    foo
JOIN    (
           SELECT    MAX(Y) max_time, X
           FROM      foo
           GROUP BY  X
        ) d_foo ON (d_foo.X = foo.X AND
                    d_foo.max_time = foo.Y);

Test case:

INSERT INTO foo VALUES (1, '2010-01-01 12:00:00', '1');
INSERT INTO foo VALUES (1, '2010-01-03 12:00:00', '2');
INSERT INTO foo VALUES (2, '2010-01-05 12:00:00', '3');
INSERT INTO foo VALUES (2, '2010-01-02 12:00:00', '4');
INSERT INTO foo VALUES (3, '2010-01-08 12:00:00', '5');
INSERT INTO foo VALUES (4, '2010-01-03 12:00:00', '6');
INSERT INTO foo VALUES (4, '2010-01-04 12:00:00', '7');

Result:

+---+---------------------+------+
| X | Y                   | Z    |
+---+---------------------+------+
| 1 | 2010-01-03 12:00:00 | 2    |
| 2 | 2010-01-05 12:00:00 | 3    |
| 3 | 2010-01-08 12:00:00 | 5    |
| 4 | 2010-01-04 12:00:00 | 7    |
+---+---------------------+------+
4 rows in set (0.02 sec)

However, if you will always be restricting your result to just one X, your solutions are probably fine. Check out @Mark Byers' answer for further tips on this.

Daniel Vassallo
A: 

here is how you will do it:

SELECT *
FROM foo INNER JOIN (SELECT `x`, MAX(`y`) AS `y` FROM foo GROUP BY `x`) AS foo2 ON foo.x = foo2.x AND foo.y = foo2.y
ovais.tariq