views:

22

answers:

1

I have a table like

      date               user_id          page_id
2010-06-19 16:00:00         1                4
2010-06-19 16:00:00         3                4
2010-06-20 07:10:00         1                1
2010-06-20 12:00:10         1                2
2010-06-20 12:00:10         1                3
2010-06-20 13:05:00         2                1
2010-06-20 14:10:00         3                1
2010-06-21 17:00:00         2                1         

I want to write a query that will return the last page_id for those users who haven't visited in the last day.

So, I can find who hasn't visited in the last day with:

SELECT user_id, MAX(page_id) 
FROM page_views GROUP BY user_id 
HAVING MAX(date) < DATE_SUB(NOW(), INTERVAL 1 DAY);

However, how can I find the last viewed page_id for these users? i.e. I want to know which page_id corresponds to the value in the same row as MAX(date). In the case where there are multiple page views per date, I can just select the MAX(page_id).

The expected output from above should be (if NOW() returns 2010-06-21 18:00:00):

user_id      page_id
   1            3
   3            1
  • user_id 1 last visited over a day ago at 2010-06-20 12:00:10, and the MAX(page_id) was 3.
  • user_id 2 last visited less than a day ago, so they are ignored.
  • user_id 3 last visited over a day ago, and their most recent page_id was 1.

How can I achieve this? I need to use only SQL. I'm using a MySQL derivative that requires all columns in the SELECT clause to be declared in the GROUP BY clause (it's a little more standards compliant).

Thanks.

+1  A: 

I could see different approaches. For example:

select a.user_id, a.page_id
from page_views a
inner join (SELECT user_id, MAX(date) as date
FROM page_views GROUP BY user_id 
HAVING MAX(date) < DATE_SUB(NOW(), INTERVAL 1 DAY) ) b on a.user_id = b.user_id 
    and a.date = b.date

It could be implemented more effective in MS SQL or Oracle with windowed functions.

Another idea:

select a.user_id, a.page_id
from page_views a
where date < DATE_SUB(NOW(), INTERVAL 1 DAY)
    and not exist(select 1 from page_views b 
        where a.user_id = b.user_id and b.date > a.date)
I do not know is it possible to have several page_id for the same user_id in the same time.If yes, you possibly need to add group by to this query
This is great, thanks. I didn't know you could join on multiple columns, and I'd forgotten about the EXIST keyword.
Michael Jones