views:

142

answers:

3

Yeah, so I'm filling out a requirements document for a new client project and they're asking for growth trends and performance expectations calculated from existing data within our database.

The best source of data for something like this would be our logs table as we pretty much log every single transaction that occurs within our application.

Now, here's the issue, I don't have a whole lot of experience with MySql when it comes to collating cumulative sum and running averages. I've thrown together the following query which kind of makes sense to me, but it just keeps locking up the command console. The thing takes forever to execute and there are only 80k records within the test sample.

So, given the following basic table structure:

id   | action | date_created
1    | 'merp' | 2007-06-20 17:17:00
2    | 'foo'  | 2007-06-21 09:54:48
3    | 'bar'  | 2007-06-21 12:47:30
... thousands of records ...
3545 | 'stab' | 2007-07-05 11:28:36

How would I go about calculating the average number of records created for each given day of the week?

day_of_week | average_records_created
1           | 234
2           | 23
3           | 5
4           | 67
5           | 234
6           | 12
7           | 36

I have the following query which makes me want to murderdeathkill myself by casting my body down an elevator shaft... and onto some bullets:

SELECT
    DISTINCT(DAYOFWEEK(DATE(t1.datetime_entry))) AS t1.day_of_week,
    AVG((SELECT COUNT(*) FROM VMS_LOGS t2 WHERE DAYOFWEEK(DATE(t2.date_time_entry)) = t1.day_of_week)) AS average_records_created
FROM VMS_LOGS t1
GROUP BY t1.day_of_week;

Halps? Please, don't make me cut myself again. :'(

+1  A: 

I rewrote your query as:

  SELECT x.day_of_week,
         AVG(x.count) 'average_records_created'
    FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
                 COUNT(*) 'count'
            FROM VMS_LOGS t
        GROUP BY DAYOFWEEK(t.datetime_entry)) x
GROUP BY x.day_of_week
OMG Ponies
Thank you for your help! I actually threw together a similar query, but both only show total records created by day of week. What I'm looking for is something along the lines of, "On any given day of the week, an average of 'x' records are created."The query you provided is still useful as they expect this data as well! Thanks!
Wilhelm Murdoch
+1  A: 

The reason why your query takes so long is because of your inner select, you are essentialy running 6,400,000,000 queries. With a query like this your best solution may be to develop a timed reporting system, where the user receives an email when the query is done and the report is constructed or the user logs in and checks the report after.

Even with the optimization written by OMG Ponies (bellow) you are still looking at around the same number of queries.

  SELECT x.day_of_week,
         AVG(x.count) 'average_records_created'
    FROM (SELECT DAYOFWEEK(t.datetime_entry) 'day_of_week',
                 COUNT(*) 'count'
            FROM VMS_LOGS t
        GROUP BY DAYOFWEEK(t.datetime_entry)) x
  GROUP BY x.day_of_week
Zyris Development Team
+1  A: 

How far back do you need to go when sampling this information? This solution works as long as it's less than a year.

Because day of week and week number are constant for a record, create a companion table that has the ID, WeekNumber, and DayOfWeek. Whenever you want to run this statistic, just generate the "missing" records from your master table.

Then, your report can be something along the lines of:

select
  DayOfWeek
, count(*)/count(distinct(WeekNumber)) as Average
from
  MyCompanionTable
group by
  DayOfWeek

Of course if the table is too large, then you can instead pre-summarize the data on a daily basis and just use that, and add in "today's" data from your master table when running the report.

scwagner
Using another table to temporarily store this data for my reports worked best. Thanks for the help! I'll keep this method in mind next time. :)
Wilhelm Murdoch