views:

183

answers:

2

I'm looking for a best practice advice how to speed up queries and at the same time to minimize the overhead needed to invoke date/mktime functions. To trivialize the problem I'm dealing with the following table layout:

CREATE TABLE my_table(
  id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,   
  important_data INTEGER,
  date INTEGER);

The user can choose to show 1) all entries between two dates:

SELECT * FROM my_table 
  WHERE date >= ? AND date <= ? 
  ORDER BY date DESC;

Output:

10-21-2009 12:12:12, 10002
10-21-2009 14:12:12, 15002
10-22-2009 14:05:01, 20030
10-23-2009 15:23:35, 300
....

I don't think there is much to improve in this case.

2) Summarize/group the output by day, week, month, year:

SELECT COUNT(*) AS count, SUM(important_data) AS important_data
  FROM my_table 
  WHERE date >= ? AND date <= ? 
  ORDER BY date DESC;

Example output by month:

10-2009, 100002
11-2009, 200030
12-2009, 3000
01-2010, 0 /* <- very important to show empty dates, with no entries in the table! */
....

To accomplish option 2) I'm currently running a very costly for-loop with mktime/date like the following:

for(...){ /* example for group by day */
  $span_from = (int)mktime(0, 0, 0, date("m", $time_min), date("d", $time_min)+$i, date("Y", $time_min));
  $span_to = (int)mktime(0, 0, 0, date("m", $time_min), date("d", $time_min)+$i+1, date("Y", $time_min)); 
  $query = "..";  
  $output = date("m-d-y", ..);
}

What are my ideas so far? Add additional/ redundant columns (INTEGER) for day (20091212), month (200912), week (200942) and year (2009). This way I can get rid of all the unnecessary queries in the for loop. However I'm still facing the problem to very fastly calculate all dates that doesn't have any equivalent in database. One way to simply move the problem could be to let MySQL do the job and simply use one big query (calculate all the dates/use MySQL date functions) with a left join (the data). Would it be wise to let MySQL take the extra load? Anyway I'm reluctant to use all these mktime/date in the for loop. Since I have complete control over the table layout and code even suggestions with major changes are welcome!

Update

Thanks to Greg I came up with the following SQL query. However it still bugs me to use 50 lines of sql statements - build up with php - that maybe could be done faster and more elegantly otherwise:

SELECT * FROM (  
  SELECT DATE_ADD('2009-01-30', INTERVAL 0 DAY) AS day UNION ALL
  SELECT DATE_ADD('2009-01-30', INTERVAL 1 DAY) AS day UNION ALL
  SELECT DATE_ADD('2009-01-30', INTERVAL 2 DAY) AS day UNION ALL
  SELECT DATE_ADD('2009-01-30', INTERVAL 3 DAY) AS day UNION ALL
  ......
  SELECT DATE_ADD('2009-01-30', INTERVAL 50 DAY) AS day ) AS dates
LEFT JOIN (
    SELECT DATE_FORMAT(date, '%Y-%m-%d') AS date, SUM(data) AS data
    FROM test 
    GROUP BY date  
  ) AS results
ON DATE_FORMAT(dates.day, '%Y-%m-%d') = results.date;
+1  A: 

You definitely shouldn't be doing a query inside a loop. You can group like this:

SELECT COUNT(*) AS count, SUM(important_data) AS important_data, DATE_FORMAT('%Y-%m', date) AS month
  FROM my_table 
  WHERE date BETWEEN ? AND ? -- This should be the min and max of the whole range
  GROUP BY  DATE_FORMAT('%Y-%m', date)
  ORDER BY date DESC;

Then pull these into an array keyed by date and loop over your data range as you are doing (that loop should be pretty light on CPU).

Greg
Thanks for the fast response! It will take me some time to test your idea, because I have to switch from INTEGER to DATETIME to successfully use DATE_FORMAT. However it looks promising. If I'm not mistaken one thing is missing, how do I get all these dates with no entries in the table? For example: I have data for 2009-10 and 2009-11, but 2009-12 is empty, but should be visible nevertheless.. should I fall back to PHP functionality for this or is there an elegant mysql solution for this?
merkuro
@merkuro: You can only output data that's *there*. Missing records won't magically appear. If you need to have a record for every day of the year, make a calendar table and make an outer join against that. Such a table needs to be created/filled only once and it needs just 365 rows per year. This way any "gaps" would disappear in the output.
Tomalak
@Tomalak You mean MySQL installations don't come with an integrated magician? Honestly, what purpose had your snarky comment? I already mentioned the option to use a left join, although I see a potential performance problem in that solution because of increased disk usage. Anyway I hoped that someone would come up with a nice range/array like feature I have overlooked.
merkuro
A: 

Another idea is not to use string inside the query. Transform the string parameter to datetime, on mysql.

STR_TO_DATE(str,format)

http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html

Mercer Traieste