I'm trying to do a cumulative aggregate in MySQL.
My raw data looks like this: (my dates are month/day/year)
user_id created_at source
1 1/1/01 foo
2 1/1/01 foo
3 1/2/01 bar
4 1/2/01 foo
5 1/3/01 foo
....
Given this, I'd like to make a result that looks like this:
date source total_in_source
1/1/01 foo 2
1/2/01 foo 3
1/3/01 foo 4
where total_in_source is an aggregate of the number of users created_at between the first and the current value of date.
I can do this using a correlated subquery, but that's a bit inefficient:
select
date(user.created_at) d,
user.source as user_source,
(select count(*) from users u where u.source = user_source and month(u.created_at) <= month_joined) as total_users_source,
from users user group by d, user_source
I've done some looking around and it seems like doing this might be a step in the right direction:
select date(u1.created_at) as 'd',
u1.source as 'source',
count(distinct u2.id) as count_users
from users u1, users u2
where
u2.created_at BETWEEN DATE(u1.created_at) AND DATE_ADD(DATE(u1.created_at), interval 1 day) and
group by d, source
but even this is a self-join and produces n^2 rows to iterate over (users * users). Is there a best-practice on how to do this?
thanks in advance.