views:

81

answers:

3

I have a messages table which looks like this:

+------------+-------------+----------+
| sender_id  |  created_at | message  |
+------------+-------------+----------+
|      1     | 2010-06-14  | the msg  |
|      1     | 2010-06-15  | the msg  |
|      2     | 2010-06-16  | the msg  |
|      3     | 2010-06-14  | the msg  |
+------------+-------------+----------|

I want to select the single most recent message for each sender.

This seems like a GROUP BY sender_id and ORDER BY created_at but I'm having trouble getting the most recent message selected.

I'm using postgres so need an aggregate function on the created_at field in the SELECT statement if I want to order by that field so I was looking at doing something like this as an initial test

SELECT messages.sender_id, MAX(messages.created_at) as the_date 
FROM messages 
GROUP BY sender_id 
ORDER BY the_date DESC 
LIMIT 10;

This seems to work but when I want to select 'message' as well I have no idea what aggregate function to use on it. I basically just want the message that corresponds to the MAX created_at.

Is there some way of getting at this or am I approaching it the wrong way?

+3  A: 

This:

SELECT  *
FROM    (
        SELECT  DISTINCT ON (sender_id) *
        FROM    messages 
        ORDER BY
                sender_id, created_at DESC 
        ) q
ORDER BY
        created_at DESC
LIMIT 5

or this:

SELECT  (mi).*
FROM    (
        SELECT  (
                SELECT  mi
                FROM    messages mi
                WHERE   mi.sender_id = m.sender_id
                ORDER BY
                        created_at DESC
                LIMIT 1
                ) AS mi
        FROM    messages m
        GROUP BY
                sender_id
        ) q
ORDER BY
        (mi).created_at  DESC
LIMIT 5

Create an index on (sender_id, created_at) for this to work fast.

You may find this article interesting:

Quassnoi
This one gives a different result, the sort order is different.
Frank Heikens
@Frank: you can use this a as sub query and reorder the results with your upper query.
Kevin
@Frank: the original question did not mention the order on `sender_id`, but here is the query with the corrected order.
Quassnoi
In relation to the article, I can't use any of the 8.4 methods like windows functions as I'm on 8.3. The 2nd query you gave there gives an error (syntax error at or near "mi"). I need the order to be by date so can't use the DISTINCT ON idea there either.
johnnymire
@johnnymire: see the post update. As for the second query, which line does it give the error on?
Quassnoi
PGError: ERROR: syntax error at or near "mi"LINE 1: ...id = m.sender_id ORDER BY the_date DESC LIMIT 1 ) mi FROM ..
johnnymire
@john: see the post update
Quassnoi
Great that's done it, thanks! I'll experiment with the indexes to try speed it up.
johnnymire
+1  A: 

Use a correlated sub query:

select * from messages m1 
where m1.created_at = (
    select max(m2.create_at) 
    from messages m2 
    where m1.sender_id = m2.sender_id
);

The sub query is reevaluated for each row processed by the upper query.

Kevin
So if you know that, why would you even suggest it. It is a horrible idea...
Evan Carroll
A: 

Use distinct on:

    SELECT DISTINCT ON (sender_id) 
           sender_id,created_at,message
      FROM messages
  ORDER BY sender_id,created_at DESC
Gavin