views:

48

answers:

2

I've been tasked to do generate some reports on our Request Tracker usage. Request Tracker is a ticketing system we use for several departments were I work. To do this I'm taking a nightly snapshot of details about tickets altered for the day into another database. This approach decouples my reporting from the the internal database schema that RT uses.

Amongst many other questions for the report, I need to report how many tickets were resolved in each month per Department. In RT the department is stored as a CustomField, and my modelling follows that trend, as you can see in my query below. However due to how I'm grabbing snapshots each night, I have multiple rows for a ticket, and the Department field can change over the month. I'm only interested in the most recent Department field. I don't know how to get that in a query.

I know I can use 'GROUP BY' to reduce my query results down to one per ticket, but when I do that, I don't know how to grab the last Department setting. As the Departments are all strings, a MAX() doesnt't get the last one. MySQL doesn't require you to use an aggregating function for fields you're selecting, but the results are indeterminate (from my testing it looks like it might grab the first one on my version of MySQL).

To illustrate, here is the results from a query that shows me two tickets, and all it's Department field settings:

"ticket_num","date","QueueName","CF","CFValue","closed"
35750,"2009-09-22","IT_help","Department","",""
35750,"2009-09-23","IT_help","Department","",""
35750,"2009-09-24","IT_help","Department","",""
35750,"2009-09-25","IT_help","Department","",""
35750,"2009-09-26","IT_help","Department","",""
35750,"2009-10-02","IT_help","Department","",""
35750,"2009-10-03","IT_help","Department","",""
35750,"2009-10-12","IT_help","Department","",""
35750,"2009-10-13","IT_help","Department","",""
35750,"2009-10-26","IT_help","Department","Conference/Visitors","2009-10-26 10:10:32"
35750,"2009-10-27","IT_help","Department","Conference/Visitors","2009-10-26 10:10:32"
36354,"2009-10-20","IT_help","Department","",""
36354,"2009-10-21","IT_help","Department","",""
36354,"2009-10-22","IT_help","Department","FS Students",""
36354,"2009-10-23","IT_help","Department","FS Students",""
36354,"2009-10-26","IT_help","Department","FS Students","2009-10-26 12:23:00"
36354,"2009-10-27","IT_help","Department","FS Students","2009-10-26 12:23:00"

As we can see, both tickets were closed on the 26th, and both tickets had an empty Department field for a few days when they first showed up. I've included my query below, you can see that I've artificially limited the number of columns returned in the second half of the where statement:

SELECT d.ticket_num, d.date, q.name as QueueName, cf.name as CF, cfv.value as CFValue, d.closed
FROM daysCF dcf
INNER JOIN daily_snapshots d on dcf.day_id = d.id
INNER JOIN Queues q on d.queue_id = q.id
INNER JOIN CustomFieldValues cfv on dcf.cfv_id = cfv.id
INNER JOIN CustomFields cf on cf.id = cfv.field_id
WHERE cf.name = 'Department' and (d.ticket_num = 35750 or d.ticket_num = 36354)
ORDER by d.ticket_num, d.date

How can I modify that query so I get a result set that tells me that in October there was one ticket closed for "FS Students" and one ticket closed for "Conference/Visitors"?

A: 

Mysql doesn't have a LAST operator, so you really need to do this using a temporary table.

CREATE TEMPORARY TABLE last_dates SELECT ticket_num, MAX(date) AS date
  FROM daily_snapshots GROUP BY ticket_num

that gets you a table with the last date for each ticket. Then in your main query, join against this table with both the ticket_num and date fields. This will filter out all rows for which the date isn't the latest for the corresponding ticket number.

You might need an index on that temporary table, I'll leave that to you.

Keith Randall
A: 

This is the "greatest-n-per-group" problem that comes up frequently on Stack Overflow.

Here's how I'd solve it in your case:

SELECT d1.ticket_num, d1.date, q.name as QueueName, 
  cf.name as CF, cfv.value as CFValue, d1.closed
FROM daysCF dcf
INNER JOIN daily_snapshots d1 ON (dcf.day_id = d1.id)
INNER JOIN Queues q ON (d1.queue_id = q.id)
INNER JOIN CustomFieldValues cfv ON (dcf.cfv_id = cfv.id)
INNER JOIN CustomFields cf ON (cf.id = cfv.field_id)
LEFT OUTER JOIN daily_snapshots d2 ON (d1.ticket_num = d2.ticket_num AND d1.date < d2.date)
WHERE d2.id IS NULL AND cf.name = 'Department'
ORDER by d1.ticket_num, d1.date;
Bill Karwin
should that LEFT OUTER JOIN end with '...AND d1.date < d2.id)'? I suspect thats a typo, and you meant '...AND d1.date < d2.date)'
Geoff Crompton
Yes, you're right. Sorry about the typo, I've fixed it.
Bill Karwin