I am assuming that a User might have joined a group long before the billing period, and might not change status during the billing period. This requires your entire table to be scanned to construct a membership table which looks like this:
create table membership (
UserId int not null,
GroupId int not null,
start datetime not null,
end datetime not null,
count int not null,
primary key (UserId, GroupId, end )
);
Once this is correctly populated, the answer you want is easily obtained:
set @sm = '2009-02-01';
set @em = date_sub( date_add( @sm, interval 1 month), interval 1 day);
# sum( datediff( e, s ) + 1 ) -- +1 needed to include last day in billing
select UserId,
GroupId,
sum(datediff( if(end > @em, @em, end),
if(start<@sm, @sm, start) ) + 1 ) as n
from membership
where start <= @em and end >= @sm
group by UserId, GroupId
having n >= 15;
The scan needs to be performed by a cursor (which will not be fast). We need to sort your input table by ActionDate and Action so that "join" events appear before "leave" events. The count field
is there to help cope with pathological cases - where a membership is ended one date, then re-started on the same date, and ended again on the same date, and started again on the same date, etc. In these cases, we increment the count for each start event, and decrement for each end event. We will only close a membership when an end event takes the count down to zero. At the end of populating the membership table, you can query the value of count: closed memberships should have count = 0, open memberships (not yet closed) should have count = 1. Any entries with count outside 0 and 1 should be examined closely - this would indicate a bug somewhere.
The cursor query is:
select UserID as _UserID, GroupID as _GroupID, Date(ActionDate) adate, Action from tbl
order by UserId, GroupId, Date(ActionDate), Action desc;
"Action desc" should break ties so that start events appear before end events should someone join and leave a group on the same date. ActionDate needs to be converted from a datetime to a date because we're interested in units of days.
The actions within the cursor would be the following:
if (Action = 1) then
insert into membership
set start=ActionDate, end='2037-12-31', UserId=_UserId, GroupId=_GroupId, count=1
on duplicate key update set count = count + 1;
elsif (Action == -1)
update membership
set end= if( count=1, Actiondate, end),
count = count - 1
where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
end if
I have not given you the exact syntax of the cursor definition required (you can find that in the MySQL manual) because the full code will obscure the idea. In fact, it might be faster to perform the cursor logic within your application - perhaps even building the membership details within the application.
EDIT: Here is the actual code:
create table tbl (
UserId int not null,
GroupId int not null,
Action int not null,
ActionDate datetime not null
);
create table membership (
UserId int not null,
GroupId int not null,
start datetime not null,
end datetime not null,
count int not null,
primary key (UserId, GroupId, end )
);
drop procedure if exists popbill;
delimiter //
CREATE PROCEDURE popbill()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE _UserId, _GroupId, _Action int;
DECLARE _adate date;
DECLARE cur1 CURSOR FOR
select UserID, GroupID, Date(ActionDate) adate, Action
from tbl order by UserId, GroupId, Date(ActionDate), Action desc;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
truncate table membership;
OPEN cur1;
REPEAT
FETCH cur1 INTO _UserId, _GroupId, _adate, _Action;
IF NOT done THEN
IF _Action = 1 THEN
INSERT INTO membership
set start=_adate, end='2037-12-31',
UserId=_UserId, GroupId=_GroupId, count=1
on duplicate key update count = count + 1;
ELSE
update membership
set end= if( count=1, _adate, end),
count = count - 1
where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
END IF;
END IF;
UNTIL done END REPEAT;
CLOSE cur1;
END
//
delimiter ;
Here's some test data:
insert into tbl values (1, 10, 1, '2009-01-01' );
insert into tbl values (1, 10, -1, '2009-01-02' );
insert into tbl values (1, 10, 1, '2009-02-03' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-05' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-06' );
insert into tbl values (1, 10, -1, '2009-02-06' );
insert into tbl values (2, 10, 1, '2009-02-20' );
insert into tbl values (2, 10, -1, '2009-05-30');
insert into tbl values (3, 10, 1, '2009-01-01' );
insert into tbl values (4, 10, 1, '2009-01-31' );
insert into tbl values (4, 10, -1, '2009-05-31' );
Here's the code being run, and the results:
call popbill;
select * from membership;
+--------+---------+---------------------+---------------------+-------+
| UserId | GroupId | start | end | count |
+--------+---------+---------------------+---------------------+-------+
| 1 | 10 | 2009-01-01 00:00:00 | 2009-01-02 00:00:00 | 0 |
| 1 | 10 | 2009-02-03 00:00:00 | 2009-02-05 00:00:00 | 0 |
| 1 | 10 | 2009-02-06 00:00:00 | 2009-02-06 00:00:00 | 0 |
| 2 | 10 | 2009-02-20 00:00:00 | 2009-05-30 00:00:00 | 0 |
| 3 | 10 | 2009-01-01 00:00:00 | 2037-12-31 00:00:00 | 1 |
| 4 | 10 | 2009-01-31 00:00:00 | 2009-05-31 00:00:00 | 0 |
+--------+---------+---------------------+---------------------+-------+
6 rows in set (0.00 sec)
Then, check how many billing days appear in feb 09:
set @sm = '2009-02-01';
set @em = date_sub( date_add( @sm, interval 1 month), interval 1 day);
select UserId,
GroupId,
sum(datediff( if(end > @em, @em, end),
if(start<@sm, @sm, start) ) + 1 ) as n
from membership
where start <= @em and end >= @sm
group by UserId, GroupId;
+--------+---------+------+
| UserId | GroupId | n |
+--------+---------+------+
| 1 | 10 | 4 |
| 2 | 10 | 9 |
| 3 | 10 | 28 |
| 4 | 10 | 28 |
+--------+---------+------+
4 rows in set (0.00 sec)
This can be made to just scan table for changes since the last run:
- remove the "truncate membership" statement.
- create a control table containing the last timestamp processed
- calculate the last timestamp you want to include in this run (I would suggest that max(ActionDate) is not good because there might be some out-of-order arrivals coming with earlier timestamps. A good choice is "00:00:00" this morning, or "00:00:00" on the first day of the month).
- alter the cursor query to only include tbl entries between the date of the last run (from the control table) and the calculated last date.
- finally update the control table with the calculated last date.
If you do that, it is also a good idea to pass in a flag that allows you to rebuild from scratch - ie. reset the control table to the start of time, and truncate the membership table before running the usual procedure.