views:

285

answers:

4

Hi - I have a table that stores the page hits on a web application, storing

unique_row_id  http_session_id  page_name   page_hit_timestamp
----------------------------------------------------------------
0              123456789        index.html  2010-01-20 15:00:00
1              123456789        info.html   2010-01-20 15:00:05
2              123456789        faq.html    2010-01-20 15:00:15
3              987654321        index.html  2010-01-20 16:00:00
4              987654321        faq.html    2010-01-20 16:00:05
5              987654321        info.html   2010-01-20 16:00:15
6              111111111        index.html  2010-01-20 16:01:00
7              111111111        faq.html    2010-01-20 16:01:05
8              111111111        info.html   2010-01-20 16:01:15

I want to run a sql query that will show me the most common page that users end browsing on.

So my initial thinking is that in my (java) app, I can run a query that will select the distinct http_session_id values from the table, and then for each distinct http_session_id, run another query that gets the page with the 'latest' page_hit_timestamp, and sum a total for of all these pages. (For the sample data above, I'd have a count of 2 for info.html and a count of 1 for faq.html.)

But, what I'd like to know is this: is there a way to combine these two queries into a single sql statement - or would I have to go down the stored procedure route for that ?

I've had a look at using join, but I can't figure out if its applicable in this scenario.

PS - I know that I could use the likes of Google Analytics in my app to provide this info for me but a) this is a mobile web app so not great for off the shelf analytics tools, and b) I'm just curious to know if this can be done in SQL.

A: 

Can you provide your two queries, I could turn them into a JOIN for you easily, or possibly a subquery depending upon your needs.

MindStalker
+4  A: 

This should do what you want:

select 1.page_name, count(*) as ExitPageCount
from WebLog l
inner join (
    select http_session_id, max(page_hit_timestamp)
    from WebLog
    group by session
) lm on l.http_session_id = lm.http_session_id and l.page_hit_timestamp = lm.page_hit_timestamp
group by 1.page_name
RedFilter
@OrbMan - Thanks for astoundingly quick response. I just had to add "as page_hit_timestamp" in your inner join select to get this to work.
Kevin
Right, I always forget that when coding from memory :)
RedFilter
+3  A: 
SELECT http_session_id, page_name, COUNT(page_name), MAX(page_hit_timestamp)
    FROM table
    GROUP BY http_session_id, page_name

This will return a row for each http_session_id and page_name combination, and that row will contain:

  • http_session_id
  • page_name
  • the count of how many times the (http_session_id+page_name) combination occurs in the table
  • the latest (MAX) timestamp for the combination
Bandi-T
Thanks - not precisely what I need, but this query gives me some clues for future work I need to get done.
Kevin
A: 

The query below lists the last accessed pages,

select http_session_id,page_name,page_hit_timestamp from 
(select row_number() over( partition by t.http_session_id order by t.page_hit_timestamp desc) rn,t.* from weblog t
) where rn=1;


if you want count, then the query below may help

select page_name,count(*) from (select 
row_number() over( partition by t.http_session_id order by t.page_hit_timestamp desc) rn,t.* from weblog t
) where rn=1
group by page_name;
Abdullah Dogan