views:

2243

answers:

3

I have a table with records that look like this:

CREATE TABLE sample (
  ix int unsigned auto_increment primary key,
  start_active datetime,
  last_active datetime
);

I need to know how many records were active on each of the last 30 days. The days should also be sorted incrementing so they are returned oldest to newest.

I'm using MySQL and the query will be run from PHP but I don't really need the PHP code, just the query.

Here's my start:

SELECT COUNT(1) cnt, DATE(?each of last 30 days?) adate
FROM sample
WHERE adate BETWEEN start_active AND last_active
GROUP BY adate;
A: 

SQL is great at matching sets of values that are stored in the database, but it isn't so great at matching sets of values that aren't in the database. So one easy workaround is to create a temp table containing the values you need:

CREATE TEMPORARY TABLE days_ago (d SMALLINT);
INSERT INTO days_ago (d) VALUES
  (0), (1), (2), ... (29), (30);

Now you can compare a date that is d days ago to the span between start_active and last_active of each row. Count how many matching rows in the group per value of d and you've got your count.

SELECT CURRENT_DATE - d DAYS, COUNT(*) cnt, 
FROM days_ago
 LEFT JOIN sample ON (CURRENT_DATE - d DAYS BETWEEN start_active AND last_active)
GROUP BY d
ORDER BY d DESC; -- oldest to newest

Another note: you can't use column aliases defined in your select-list in expressions until you get to the GROUP BY clause. Actually, in standard SQL you can't use them until the ORDER BY clause, but MySQL supports using aliases in GROUP BY and HAVING clauses as well.

Bill Karwin
+3  A: 

Do an outer join.

No table? Make a table. I always keep a dummy table around just for this.

create table artificial_range( 
  id int not null primary key auto_increment, 
  name varchar( 20 ) null ) ;

-- or whatever your database requires for an auto increment column

insert into artificial_range( name ) values ( null )
-- create one row.

insert into artificial_range( name ) select name from artificial_range;
-- you now have two rows

insert into artificial_range( name ) select name from artificial_range;
-- you now have four rows

insert into artificial_range( name ) select name from artificial_range;
-- you now have eight rows

--etc.

insert into artificial_range( name ) select name from artificial_range;
-- you now have 1024 rows, with ids 1-1024

Now make it convenient to use, and limit it to 30 days, with a view:

Edit: JR Lawhorne notes:

You need to change "date_add" to "date_sub" to get the previous 30 days in the created view.

Thanks JR!

create view each_of_the_last_30_days as
select date_sub( now(), interval (id - 1) day ) as adate
from artificial_range where id < 32;

Now use this in your query (I haven't actually tested your query, I'm just assuming it works correctly):

Edit: I should be joining the other way:

SELECT COUNT(*) cnt, b.adate
FROM  each_of_the_last_30_days b
left outer join sample a 
 on ( b.adate BETWEEN a.start_active AND a.last_active)
GROUP BY b.adate;
tpdi
You need to change "date_add" to "date_sub" to get the previous 30 days in the created view.
JR Lawhorne
A: 

Turn the date into a unix timestamp, which is seconds, in your query and then just look for the difference to be <= the number of seconds in a month.

You can find more information here: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_unix-timestamp

If you need help with the query please let me know, but MySQL has nice functions for dealing with datetime.

[Edit] Since I was confused as to the real question, I need to finish the lawn but before I forget I want to write this down.

To get a count of the number by day you will want your where clause to be as I described above, to limit to the past 30 days, but you will need to group by day, and so select by converting each start to a day of the month and then do a count of those.

This assumes that each use will be limited to one day, if the start and end dates can span several days then it will be trickier.

James Black
Turn which date into a unix timestamp? `CURRENT_DATE - INTERVAL 30 DAY`, `start_active` or `end_active`? How does this help the OP group the count of active records *for each day* of the last thirty?
pilcrow
Turn both start_active and last_active into unix_timestamp, or, though I haven't tried this, you could use subtime() to subtract the times and then convert, or, in the page I suggested above you can use something like this: select ID from MESSAGE where SENT_TIME < (CURDATE() - INTERVAL 5 DAY);Regardless, just look through the time functions and you should be able to find what will work best for you.
James Black
@James Black, your answer phrasing suggests that you think the OP wants to know count of records overlapping with a time window of 30 days -- but that's the wrong question, he wants instead to know the count of overlapping records *each day* for the last 30 days. (Yes, the date arithmetic is easy in SQL native types or using integral epoch stamps.) Perhaps you could post a code solution to illustrate your proposal?
pilcrow
Oh, thank you for the correction. I have done something like this before, I will need to think about how I did it, as I remember the query was very long and complex.
James Black