views:

113

answers:

4

Hi everyone. I have a database schema that is similar to the following:

| User   | Event         | Date
|--------|---------------|------
| 111    | Walked dog    | 2009-10-1
| 222    | Walked dog    | 2009-10-2
| 333    | Fed Fish      | 2009-10-5
| 222    | Did Laundry   | 2009-10-6
| 111    | Fed Fish      | 2009-10-7
| 111    | Walked dog    | 2009-10-18
| 222    | Walked dog    | 2009-10-19
| 111    | Fed Fish      | 2009-10-21

I would like to produce a query that returns the maximum number of times a user performs some action within a time period. For example, given a time period of 5 days, what is the maximum number of times user 111 walked the dog?

The most obvious solution would be to start at some zero point and move forward each day, summing up 5 day periods along the way, then taking the maximum total out of all the 5 day windows. the approach seems incredibly costly however.

I would appreciate any suggestions you may have.

EDIT 1:

Thanks for the comments / answers. To respond: - I'm using mySQL v5.0 - There could be any number of events per day (per any time period really) - @Paulo Santos: thanks,but like the comment points out, i need to find the window that produces the most results, the window itself can slide. - @Mark: this looks like an interesting solution, although i recall reading that mySQL does not support backing up or jumping ahead cursors.
- @orbMan: this looks promising. I don't fully understand it yet, but I will give this a try tonight. - @mjv: another promising solution. also looks complicated, but I will give it another look

thanks again!

+2  A: 

For you specific request I'd do something like:

SELECT User, Event, Count(*)
  FROM Table
 WHERE Date between @d1 and @d2
 Group by User, Event

Then it will return the number of time each user performed each task within the specified (@d1 and @d2) time frame.

Paulo Santos
I don't think that's quite what Darren is asking. Rather, he's wants to find the five-day span in which the user performed the task most often. He's asking if there's a way to determine this without executing your query for *every* set of five contiguous days.
Callahad
I'll take a look into it... It's not trivial, but it's not that difficult either.
Paulo Santos
+1  A: 

Here's an alternative algorithm that is cursor based.

Start with two cursors, begin and end, both pointing at the initial row, and current count = 0, and current maximum = 0.

If DATE_DIFF(end.date, begin.date) is more than 5, advance the begin cursor one row. Subtract one from current count if the old row was 'walked the dog'.

If DATE_DIFF(end.date, begin.date) is not more than 5, advance the end cursor one row. Aadd one to current count if the new row is 'walked the dog'. If the current count is greater than current maximum, set current maximum to current count.

Continue until you have covered all rows in the range.

Mark Byers
+1  A: 

The following SQL code addresses the problem in a declarative fashion, rather than a purely procedural/algorithmic fashion. Depending on the situation, it is probably more efficient (as compared with getting the [sorted] data from SQL and then running some algorithm, and even compared with server-side, cursor-based solutions.)

The idea is to get the counts of [relevant/filtered] events, per user, per day in a separate table or CTE. and then for each Day+User, to tally the number of events for this day and for the next 4 days, and, finally to select (per user) row with the maximum of these tally.

SELECT User, Date, COUNT(*) AS EventCount
INTO tmpTableByUsrByDay
FROM myTable
-- WHERE Event = some_targeted_event   --Optional condition(s)
GROUP BY User, Date, COUNT(*)


SELECT DISTINCT User, Date AS FirstDay, 
   MAX(FiveFaysEventCount) AS EventCountForThisAndNext4Days.
FROM (
  SELECT T1.User, T1.Date, SUM(T2.EventCount) FiveDaysEventCount
  FROM tmpTableByUsrByDay T1
  JOIN tmpTableByUsrByDay T2 ON T2.Date >= T1.Date 
       AND T2.Date <= DATEADD(day, 4, T1.Date)
  GROUP BY T1.User, T1.Date
)

Notes:
- It uses a temporary table, although a Common Table Expression (CTE) could be used instead depending on the underlying SQL host.
- The particular name/syntax for the DateAdd() function may vary between SQL implementations.
- Also this implies that the "date" field contains "only" a date, i.e either a date or datetime/smalldatetime where the time part is fixed (to say 00:00). If that wasn't the case, i.e. if the database had date and time in the column, this could be fixed at the level of the CTE/temp-table query.

mjv
+1  A: 
select top 1 x.Date as StartDate, DATEADD(day, 5, x.Date) as EndDate, COUNT(*) as Count
from Event e
inner join Event x on 1=1
where e.Date between x.Date and DATEADD(day, 5, x.Date)
    and e.Event = 'Walked dog'
group by x.Date, DATEADD(day, 5, x.Date)
order by Count desc

Output:

StartDate  EndDate    Count
---------- ---------- -----------
2009-10-01 2009-10-06 2
RedFilter
Updated with a more permissive version that starts counting days from any event day.
RedFilter