views:

619

answers:

2

I have a table of user access sessions which records website visitor activity:

accessid, userid, date, time, url

I'm trying to retrieve all distinct sessions for userid 1234, as well as the earliest date and time for each of those distinct sessions.

SELECT 
 DISTINCT accessid, 
 date, 
 time 
FROM 
 accesslog 
WHERE userid = '1234' 
GROUP BY accessid

This gives me the date and time of a random row within each distinct accessid. I've read a number of posts recommending the use of min() and max(), so I tried:

SELECT DISTINCT accessid, MIN(DATE) AS date, MIN(TIME) AS time FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC

... and even...

SELECT DISTINCT accessid, MIN(CONCAT(DATE, ' ', TIME)) AS datetime FROM accesslog WHERE userid = '1234' GROUP BY accessid ORDER BY date DESC, time DESC

... but I never get the correct result of the earliest date and time.

What is the trick to ordering this kind of query?

EDIT -

Something weird is happening....

The code posted below by Bill Karwin correctly retrieves the earliest date and time for sessions that started in 2009-09. But, for sessions that began on some day in 2009-08, the time and date for the first hit occurring in the current month is what is returned. In other words, the query does not appear to be spanning months!

Example data set:

accessid  | userid  | date    | time
1   | 1234  | 2009-08-15 | 01:01:01
1   | 1234  | 2009-09-01 | 12:01:01
1   | 1234  | 2009-09-15 | 13:01:01
2   | 1234  | 2009-09-01 | 14:01:01
2   | 1234  | 2009-09-15 | 15:01:01

At least on my actual data table, the query posted below finds the follow earliest date and time for each of the two accessid's:

accessid  | userid  | date    | time
1   | 1234  | 2009-09-01 | 12:01:01
2   | 1234  | 2009-09-01 | 14:01:01

... and I would guess that the only reason the result for accessid 2 appears correct is because it has no hits in a previous month.

Am I going crazy?

EDIT 2 -

The answer is yes, I am going crazy. The query works on the above sample data when placed in a table of duplicate structure.

Here is the (truncated) original data. I included the very first hit, another hit in the same month, the first hit of the next month, and then the last hit of the month. The original data set has many more hits in between these points, for a total of 462 rows.

accessid       | date   | time
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-18 | 04:01:42
cbb82c08d3103e721a1cf0c3f765a842 | 2009-08-23 | 23:18:52
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-17 | 05:12:16
cbb82c08d3103e721a1cf0c3f765a842 | 2009-09-18 | 06:29:59

... the query returns the 2009-09-17 value as the earliest value when the original table is queried. But, when I copy the ........ oh, balls.

It's because the hits from 2009-08% have an empty userid field.

+1  A: 

This is a variation of the "greatest-n-per-group" problem that comes up on StackOverflow several times per week.

SELECT 
        a1.accessid, 
        a1.date, 
        a1.time 
FROM 
        accesslog a1
LEFT OUTER JOIN
        accesslog a2
  ON (a1.accessid = a2.accessid AND a1.userid = a2.userid
    AND (a1.date > a2.date OR a1.date = a2.date AND a1.time > a2.time))
WHERE a1.userid = '1234'
  AND a2.accessid IS NULL;

The way this works is that we try to find a row (a2) that has the same accessid and userid, and an earlier date or time than the row a1. When we can't find an earlier row, then a1 must be the earliest row.


Re your comment, I just tried it with the sample data you provided. Here's what I get:

+----------+------------+----------+
| accessid | date       | time     |
+----------+------------+----------+
|        1 | 2009-08-15 | 01:01:01 | 
|        2 | 2009-09-01 | 14:01:01 | 
+----------+------------+----------+

I'm using MySQL 5.0.75 on Mac OS X.

Bill Karwin
i believe you, but its not working! its the strangest thing... the query is finding the earliest date and time *this month*... for sessions that began in a previous month, the start date is seen as the earliest time on the first day a hit was recorded in the current month only. i'll update the post with more details.
Ian
lovely, with the sample data it works for me as well...
Ian
thanks bill, i'm an idiot. the actual earliest date had a blank userid so was being excluded. :)
Ian
Aha! Yep, it's always worth double-checking the data. Cheers!
Bill Karwin
A: 

Try this

SELECT 
    accessid, 
    date, 
    time 
FROM 
    accesslog 
WHERE userid = '1234' 
GROUP BY accessid
HAVING MIN(date)

It will return all unique accesses with minimum time for each for userid = '1234'.

martin.malek
HAVING restricts groups, not rows within a group.
Bill Karwin