tags:

views:

115

answers:

4

Update: I am editing my question in the hope of getting a better answer. I see this is not so simple but I cant believe there is not a simpler solution than what has been mentioned so far. I am now looking to see if there is some kind of php, mysql solution to deal with this in the most efficent way. I have modified my question below to try and make things clearer

I have a table with the following fields:

  • UserID
  • GroupID
  • Action
  • ActionDate

This table simply stores whenever a user on my system is added to a group (action = 1) or removed from a group (action = -1). The datetime is recorded whenever one of the above actions take place, as ActionDate

A group is charged for every user they have each month as long as the user was part of the group for at least 15 days of that billing month (a billing month means not the beginning of a month necessarily, could be from the 15th of Jan to 15th of Feb)

I bill my groups every month at the begining of a billing month for all users who are part of their group at that time. Now over the course of the month they might add new users to their group or remove existing users from their group. If they removed a user I need to know if the user was part of the group for at least 15 days of that billing month. If he was then do nothing, if not then the group needs to be refunded for that user (as they paid for the user at the beginning of the month but he was part of the group for less than 15 days) If they added a user and the user was in the group for at least 15 days (ie added within 15 days of billing month AND was not removed before 15 days were up) then the group must be charged for this user. If the user did not end up with 15 days as part of the group then we do nothing (no charge).

Some of the additional complexities are:

  • A user might be added or removed multiple times over the course of that billing month and we would need to keep track of total number of days that he was part of the group
  • We need to be able to differentiate between users who are being removed (ultimately) or added (ultimately) in order to correctly bill the group. (for example a user who has 10 days as part of the group - if he was ultimately removed from the group then we issue a refund. If he was being added to the group then we dont charge - because less than 10 days)
  • In any given billing month the user might not appear in this table since their status was not changed - ie they remained a part of the group or were never part of the group. The truth is that nothing needs to be done with these users as if necessary they will be included in the base monthly calculation of "how many users in group today"

I am starting to realize there is no simple mysql solution and i need a php, mysql combo. Please help!!!

Here is my most recent sql attempt but it does not incorporate all the issues i have discussed below:

SELECT * 
  FROM groupuserlog 
 where action = 1 
   and actiondate >= '2010-02-01' 
   and actiondate < date_add('2010-02-01',INTERVAL 15 DAY) 
   and userid not in (select userid 
                        from groupuserlog 
                       where action = -1 
                         and actiondate < '2010-03-01' 
                         and actiondate > date_add('2010-02-01', INTERVAL 15 DAY))
A: 

Not sure about your table but perhaps something like?

SELECT COUNT(UserID)
FROM MyTable
WHERE MONTH(ActionDate) = 3
AND GroupID = 1
AND Action = 1
GROUP BY UserID
Jonathan
That will tell me how many users were added in a given month. I need to know how many days they remained part of the group - therefore I need to take into account if they were removed or not in that month
Gotts
So you want to query for a given month to find out how many users where in that group for at least 15 days?
Jonathan
exactly, baring in mind that users mightve been added or removed multiple times in that month
Gotts
sounds to me like you need to have another table with a value indicating days in a group, then increment/decrement as actions are performed.
Jonathan
A: 

I think all the complexity lies in how to figure out the adjacent remove action for a given add action. So, how about adding a column pointing at the primary key of the subsequent action?

Supposing that column is called NextID,

How many users joined a group in a given month and remained part of that group for at least 15 days:

SELECT COUNT(DISTINCT UserID)
FROM MyTable AS AddedUsers
LEFT OUTER JOIN MyTable
  ON MyTable.ID = AddedUsers.NextID
  AND MyTable.ActionDate > DATE_ADD(AddedUsers.ActionDate, INTERVAL 15 DAY)
  AND MyTable.Action = -1
WHERE MONTH(AddedUsers.ActionDate) = 3 AND YEAR(AddedUsers.ActionDate) = 2012
  AND AddedUsers.GroupID = 1
  AND AddedUsers.Action = 1
  AND MONTH(DATE_ADD(AddedUsers.ActionDate, INTERVAL 15 DAY)) = 3;

How many people were removed from a group in a given month that did not remain in a group for at least 15 days:

SELECT COUNT(DISTINCT UserID)
FROM MyTable AS RemovedUsers
INNER JOIN MyTable
  ON MyTable.NextID = RemovedUsers.ID
  AND RemovedUsers.ActionDate <= DATE_ADD(MyTable.ActionDate, INTERVAL 15 DAY)
  AND MyTable.Action = 1
WHERE MONTH(RemovedUsers.ActionDate) = 3 AND YEAR(RemovedUsers.ActionDate) = 2012
  AND RemovedUsers.GroupID = 1
  AND RemovedUsers.Action = -1;
ento
This looks like a step in the right directions.A few questions - it doesnt seem to me like it will work if there is more than one add or remove action per user in a given month. Likewise if there is ONLY a remove or ONLY an add in a given month.On a sidepoint why do you need the NextID column, doesnt it work to just join on the primary key of both tables, "INNER JOIN MyTable ON MyTable.ID = RemovedUsers.ID"
Gotts
+1  A: 

I am assuming that a User might have joined a group long before the billing period, and might not change status during the billing period. This requires your entire table to be scanned to construct a membership table which looks like this:

create table membership (
   UserId int not null,
   GroupId int not null,
   start datetime not null,
   end datetime not null,
   count int not null,
   primary key (UserId, GroupId, end )
);

Once this is correctly populated, the answer you want is easily obtained:

set @sm = '2009-02-01';
set @em = date_sub( date_add( @sm, interval 1 month), interval 1 day);

# sum( datediff( e, s ) + 1 ) -- +1 needed to include last day in billing

select UserId, 
       GroupId,  
       sum(datediff( if(end > @em, @em, end), 
                     if(start<@sm, @sm, start) ) + 1 ) as n
from membership 
where start <= @em and end >= @sm
group by UserId, GroupId
having n >= 15;

The scan needs to be performed by a cursor (which will not be fast). We need to sort your input table by ActionDate and Action so that "join" events appear before "leave" events. The count field is there to help cope with pathological cases - where a membership is ended one date, then re-started on the same date, and ended again on the same date, and started again on the same date, etc. In these cases, we increment the count for each start event, and decrement for each end event. We will only close a membership when an end event takes the count down to zero. At the end of populating the membership table, you can query the value of count: closed memberships should have count = 0, open memberships (not yet closed) should have count = 1. Any entries with count outside 0 and 1 should be examined closely - this would indicate a bug somewhere.

The cursor query is:

select UserID as _UserID, GroupID as _GroupID, Date(ActionDate) adate, Action from tbl 
order by UserId, GroupId, Date(ActionDate), Action desc;

"Action desc" should break ties so that start events appear before end events should someone join and leave a group on the same date. ActionDate needs to be converted from a datetime to a date because we're interested in units of days.

The actions within the cursor would be the following:

if (Action = 1) then 
  insert into membership 
    set start=ActionDate, end='2037-12-31', UserId=_UserId, GroupId=_GroupId, count=1
    on duplicate key update set count = count + 1;
elsif (Action == -1) 
  update membership 
    set end= if( count=1, Actiondate, end),
        count = count - 1 
    where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
end if

I have not given you the exact syntax of the cursor definition required (you can find that in the MySQL manual) because the full code will obscure the idea. In fact, it might be faster to perform the cursor logic within your application - perhaps even building the membership details within the application.

EDIT: Here is the actual code:

create table tbl (
   UserId int not null,
   GroupId int not null,
   Action int not null,
   ActionDate datetime not null
);

create table membership (
   UserId int not null,
   GroupId int not null,
   start datetime not null,
   end datetime not null,
   count int not null,
   primary key (UserId, GroupId, end )
);

drop procedure if exists popbill;
delimiter //

CREATE PROCEDURE popbill()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE _UserId, _GroupId, _Action int;
  DECLARE _adate date;
  DECLARE cur1 CURSOR FOR 
  select UserID, GroupID, Date(ActionDate) adate, Action 
  from tbl order by UserId, GroupId, Date(ActionDate), Action desc;

  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  truncate table membership;

  OPEN cur1;

  REPEAT
    FETCH cur1 INTO _UserId, _GroupId, _adate, _Action;
    IF NOT done THEN
       IF _Action = 1 THEN
          INSERT INTO membership
          set start=_adate, end='2037-12-31', 
              UserId=_UserId, GroupId=_GroupId, count=1
          on duplicate key update count = count + 1;
       ELSE
          update membership 
          set end= if( count=1, _adate, end),
              count = count - 1 
          where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
       END IF;
    END IF;
  UNTIL done END REPEAT;

  CLOSE cur1;
END
//

delimiter ;

Here's some test data:

insert into tbl values (1, 10, 1, '2009-01-01' );
insert into tbl values (1, 10, -1, '2009-01-02' );
insert into tbl values (1, 10, 1, '2009-02-03' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-05' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-06' );
insert into tbl values (1, 10, -1, '2009-02-06' );
insert into tbl values (2, 10, 1, '2009-02-20' );
insert into tbl values (2, 10, -1, '2009-05-30');
insert into tbl values (3, 10, 1, '2009-01-01' );
insert into tbl values (4, 10, 1, '2009-01-31' );
insert into tbl values (4, 10, -1, '2009-05-31' );

Here's the code being run, and the results:

call popbill;
select * from membership;

+--------+---------+---------------------+---------------------+-------+
| UserId | GroupId | start               | end                 | count |
+--------+---------+---------------------+---------------------+-------+
|      1 |      10 | 2009-01-01 00:00:00 | 2009-01-02 00:00:00 |     0 |
|      1 |      10 | 2009-02-03 00:00:00 | 2009-02-05 00:00:00 |     0 |
|      1 |      10 | 2009-02-06 00:00:00 | 2009-02-06 00:00:00 |     0 |
|      2 |      10 | 2009-02-20 00:00:00 | 2009-05-30 00:00:00 |     0 |
|      3 |      10 | 2009-01-01 00:00:00 | 2037-12-31 00:00:00 |     1 |
|      4 |      10 | 2009-01-31 00:00:00 | 2009-05-31 00:00:00 |     0 |
+--------+---------+---------------------+---------------------+-------+
6 rows in set (0.00 sec)

Then, check how many billing days appear in feb 09:

set @sm = '2009-02-01';
set @em = date_sub( date_add( @sm, interval 1 month), interval 1 day);

select UserId, 
       GroupId,  
       sum(datediff( if(end > @em, @em, end), 
                 if(start<@sm, @sm, start) ) + 1 ) as n
from membership 
where start <= @em and end >= @sm
group by UserId, GroupId;

+--------+---------+------+
| UserId | GroupId | n    |
+--------+---------+------+
|      1 |      10 |    4 |
|      2 |      10 |    9 |
|      3 |      10 |   28 |
|      4 |      10 |   28 |
+--------+---------+------+
4 rows in set (0.00 sec)

This can be made to just scan table for changes since the last run:

  1. remove the "truncate membership" statement.
  2. create a control table containing the last timestamp processed
  3. calculate the last timestamp you want to include in this run (I would suggest that max(ActionDate) is not good because there might be some out-of-order arrivals coming with earlier timestamps. A good choice is "00:00:00" this morning, or "00:00:00" on the first day of the month).
  4. alter the cursor query to only include tbl entries between the date of the last run (from the control table) and the calculated last date.
  5. finally update the control table with the calculated last date.

If you do that, it is also a good idea to pass in a flag that allows you to rebuild from scratch - ie. reset the control table to the start of time, and truncate the membership table before running the usual procedure.

Martin
Thats quite a piece of work. I will have to work slowly through this. My gut feeling is that this is overkill for my purposes. I am thinking of just rewriting the table which stores the information and approaching it differently. But will first try work through this solution. thanks
Gotts
I'll try and code up an example using cursors over the weekend.
Martin
if you can put something together that would be a big help. I dont need you to code the whole thing - I just need an approach as I am stuck!!!
Gotts
UpVote for taking so much effort
Andy
Thank you for this unbelievable piece of work!!!! Really appreicate it!
Gotts
A: 

I started working through Martin's proposed solution and realised that although it is probably the right path to take i decided that I would go with what I know best which is php as opposed to complex sql. Although for sure less efficient, since my table sizes will never be too big it makes the most sense for me.

In the end I wrote a simple query which creates a user history in chronological order for all user activity in the group for a given month.

SELECT Concat(firstname,' ',lastname) as name, username, UserID,ACTION , Date(ActionDate), Unix_Timestamp(ActionDate) as UN_Action, DateDiff('$enddate', actiondate ) AS DaysTo, DateDiff( actiondate, '$startdate' ) AS DaysFrom
        FROM `groupuserlog` inner join users on users.id = groupuserlog.userid WHERE groupuserlog.groupid = $row[groupid] AND ( actiondate < '$enddate' AND actiondate >= '$startdate') ORDER BY userid, actiondate

I then loop through the result set and collect all data for each user. The first action (either add or remove) of the month indicates whether or not this is a user was someone who previously existed in the group or not. I then go through the history and simply calculate the number of active days - at the end of it I just see if a refund or charge shoudl be issued, depending on whether the user previously existed in the group or not.

Its not so pretty but it does the job cleanly and allows me for some additional processing which I need to do.

Thanks to everyone fo the help.

My php code, if anyone is interested looks as follows:

while($logrow = mysql_fetch_row($res2)) {

                list($fullname, $username, $guserid,$action,$actiondate,$uxaction,$daysto,$daysfrom) = $logrow;
                if($action == 1)
                    $actiondesc = "Added";
                else
                    $actiondesc = "Removed";


                //listing each user by individual action and building a history
                //the first action is very important as it defines the previous action

                if($curruserid != $guserid) {

                    if($curruserid > 0) {
                        //new user history so reset and store previous user value
                        if($wasMember) {
                            //this was an existing member so check if need refund (if was not on for 15 days)
                            $count = $basecount + $count;
                            echo "<br>User was member and had $count days usage";
                            if($count< 15) {
                                array_push($refundarrinfo, "$fullname (#$guserid $username)");
                                array_push($refundarr, $guserid);
                                echo " REFUND";
                            } else
                                echo " NONE";

                        } else {
                            //this user was not an existing member - see if need to charge (ie if was on for min 15 days)
                            $count = $basecount + $count;
                            echo "<br>User was not a member and was added for $count days usage";
                            if($count >= 15) {
                                array_push($billarrinfo, "$fullname (#$guserid $username)");
                                array_push($billarr, $guserid);
                                echo " CHARGE";
                            } else
                                echo " NONE";
                        }
                    }

                    $basecount = 0;
                    $count = 0;
                    $prev_uxaction = 0;

                    //setup new user - check first action
                     echo "<br><hr><br>$guserid<br>$actiondesc - $actiondate"; // - $daysto - $daysfrom";
                    if($action == 1)
                        $wasMember = FALSE;
                    else {
                        //for first action - if is a remove then store in basecount the number of days that are for sure in place
                        $basecount = $daysfrom;
                        $wasMember = TRUE; //if doing a remove myust have been a member
                    }

                } else
                    echo "<br>$actiondesc - $actiondate";// - $daysto - $daysfrom";

                $curruserid = $guserid;

               if($action == 1) { //action = add
                    $count = $daysto;
                    $prev_uxaction = $uxaction;  //store this actiondate in case needed for remove calculation
                } else { //action = remove
                    //only do something if this is a remove coming after an add - if not it has been taken care of already
                    if($prev_uxaction != 0) {
                        //calc no. of days between previous date and this date and overwrite count by clearing and storing in basecount
                        $count = ($uxaction - $prev_uxaction)/(60 * 60 * 24);
                        $basecount = $basecount + $count;
                        $count = 0; //clear the count as it is stored in basecount
                    }
                }
Gotts