views:

48

answers:

2

I have a table named 'Attendance' which is used to record student attendance time in courses. This table has 4 columns, say 'id', 'course_id', 'attendance_time', and 'student_name'. An example of few records in this table is:

23    100    1/1/2010 10:00:00    Tom

24    100    1/1/2010 10:20:00    Bob

25    187    1/2/2010 08:01:01    Lisa

.....

I want to create a summary of the latest attendance time for each course. I created a query below:

SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id

The result would be something like this

100    1/1/2010 10:20:00

187    1/2/2010 08:01:01

Now, all I want to do is add the 'id' column to the result above. How to do it?

I can't just change the command to something like this

SELECT id, course_id, max(attendance_time) FROM attendance GROUP BY id, course_id

because it would return all the records as if the aggregate function is not used. Please help me.

+2  A: 

This is a typical 'greatest per group', 'greatest-n-per-group' or 'groupwise maximum' query that comes up on Stack Overflow almost every day. You can search Stack Overflow for these terms to find many different examples of how to solve this with different databases. One way to solve it is as follows:

SELECT
    T2.course_id,
    T2.attendance_time
    T2.id
FROM (
    SELECT
        course_id,
        MAX(attendance_time) AS attendance_time
    FROM attendance
    GROUP BY course_id
) T1
JOIN attendance T2
ON T1.course_id = T2.course_id
AND T1.attendance_time = T2.attendance_time

Note that this query can in theory return multiple rows per course_id if there are multiple rows with the same attendance_time. If that cannot happen then you don't need to worry about this issue. If this is a potential problem then you can solve this by adding an extra grouping on course_id, attendance_time and selecting the minimum or maximum id.

Mark Byers
Thanks. It works!
Haris
A: 

What do you need the additional column for? It already has a course ID, which identifies the data. A synthetic ID to the query would be useless because it does not refer to anything. If you want to get the max from the query results for a single course, then you can add a where condition like this:

SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id **WHERE course_id = your_id_here**;

If you mean that the column should be named 'id', you can alias it in the query:

SELECT course_id **AS id**, max(attendance_time) FROM attendance GROUP BY course_id;

You could make a view out of your query to easily access the aggregate data:

CREATE VIEW max_course_times AS SELECT course_id AS id, max(attendance_time) FROM attendance GROUP BY course_id;

SELECT * FROM max_course_times;

jmz
-1, read on *greatest-n-per-group* to get the pattern requested here (or Mark's answer).
Unreason
Actually, my case above is only a simplification of a real more complex case. I can't find other example case than this attendance summary problem.
Haris
@Haris, my comment above was to jmz (explaining why -1). I don't understand your comment - browsing so with a given tag will give many more answers. Also, you might want to check http://dev.mysql.com/doc/refman/5.1/en/example-maximum-column-group-row.html
Unreason