views:

52

answers:

1

I have two tables that I need to join... I want to join table1 and table2 on 'id' - however in table two id is not unique. I only want one value returned for table two, and this value represents the sum of a column called 'total_sold' - within a specified date range (say one month), however I want more than one date range at the same time...

SELECT ta.id, sum(tb.total_sold) as total_sold_this_week, sum(tc.total_sold) as total_sold_this_month
FROM table_a as ta
LEFT JOIN table_b as tb ON ta.id=tb.id AND tb.date_sold BETWEEN ADDDATE(NOW(),INTERVAL -1 WEEK) AND NOW()
LEFT JOIN table_b as tc ON ta.id=tc.id AND tc.date_sold BETWEEN ADDDATE(NOW(),INTERVAL -1 MONTH) AND NOW()
GROUP BY ta.id

this works but does not SUM the rows - only returning one row for each id... how do I get the sum from table b instead of only one row??? Please criticise if format of question could use more work - I can rewrite and provide sample data if required - this is a trivialised version of a much larger problem.

-Thanks

+3  A: 

Using Subqueries

One way to solve this would be to use subqueries. LEFT JOIN creates a new "result" for each match in the right table, so using two LEFT JOINs is creating more ROWS than you want. You could just sub select the value you want, but this can be slow:

SELECT ta.id, 
   (SELECT SUM(total_sold) as total_sold 
    FROM table_b 
    WHERE date_sold BETWEEN ADDDATE(NOW(), INTERVAL -1 WEEK) AND NOW()
    AND id=ta.id) as total_sold_this_week, 
   (SELECT SUM(total_sold) as total_sold 
    FROM table_b 
    WHERE date_sold BETWEEN ADDDATE(NOW(), INTERVAL -1 MONTH) AND NOW() 
    AND id = ta.id) as total_sold_this_month 
FROM table_a ta;

Result:

+----+----------------------+-----------------------+
| id | total_sold_this_week | total_sold_this_month |
+----+----------------------+-----------------------+
|  1 |                    3 |                     7 |
|  2 |                    4 |                     4 |
|  3 |                 NULL |                  NULL |
+----+----------------------+-----------------------+
3 rows in set (0.04 sec)

Using SUM(CASE ...)

This method doesn't use subqueries (and will likely be faster on larger data sets). We want to join the table_a and table_b together once, using our "biggest" date range, and then use a SUM() based on a CASE to calculate the "smaller range".

SELECT ta.*, 
  SUM(total_sold) as total_sold_last_month, 
  SUM(CASE 
    WHEN date_sold BETWEEN NOW() - INTERVAL 1 WEEK AND NOW() 
    THEN total_sold
    ELSE 0 
    END) as total_sold_last_week 
FROM table_a AS ta 
LEFT JOIN table_b AS tb 
   ON ta.id=tb.id AND tb.date_sold BETWEEN ADDDATE(NOW(),INTERVAL -1 MONTH) AND NOW() 
GROUP BY ta.id;

This returns nearly the same resultset as the subquery example:

+----+-----------------------+----------------------+
| id | total_sold_last_month | total_sold_last_week |
+----+-----------------------+----------------------+
|  1 |                     7 |                    3 |
|  2 |                     4 |                    4 |
|  3 |                  NULL |                    0 |
+----+-----------------------+----------------------+
3 rows in set (0.00 sec)

The only difference is the 0 instead of NULL. You could summarize as many date ranges as you'd like using this method, but its still probably best to limit the rows returned to the largest range in the ON clause.

Just to show how it works: removing the GROUP BY and SUM() calls, and adding date_sold to the SELECT returns this:

+----+------------+-----------------------+----------------------+
| id | date_sold  | total_sold_last_month | total_sold_last_week |
+----+------------+-----------------------+----------------------+
|  1 | 2010-04-30 |                     2 |                    2 |
|  1 | 2010-04-24 |                     2 |                    0 |
|  1 | 2010-04-24 |                     2 |                    0 |
|  1 | 2010-05-03 |                     1 |                    1 |
|  2 | 2010-05-03 |                     4 |                    4 |
|  3 | NULL       |                  NULL |                    0 |
+----+------------+-----------------------+----------------------+
6 rows in set (0.00 sec)

Now when you GROUP BY id, and SUM() the two total_sold columns you have your results!

Old Advice

Before you brought the two different date ranges into the mix, you could use GROUP BY to group using the table id on table1, and the SUM() aggregate function to add up the rows returned.

SELECT ta.id, SUM(tb.total_sold) as total_sold_this_week
FROM table_a as ta
LEFT JOIN table_b as tb 
ON ta.id=tb.id AND tb.date_sold BETWEEN ADDDATE(NOW(),INTERVAL -3 WEEK) AND NOW()
GROUP BY ta.id
+----+----------------------+
| id | total_sold_this_week |
+----+----------------------+
|  1 |                    7 |
|  2 |                    4 |
|  3 |                 NULL |
+----+----------------------+
3 rows in set (0.00 sec)

The test data

NOW() is 2010-05-03

mysql> select * from table_a; select * from table_b;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

+----+------------+------------+
| id | date_sold  | total_sold |
+----+------------+------------+
|  1 | 2010-04-24 |          2 |
|  1 | 2010-04-24 |          2 |
|  1 | 2010-04-30 |          2 |
|  1 | 2010-05-03 |          1 |
|  2 | 2010-05-03 |          4 |
+----+------------+------------+
5 rows in set (0.00 sec)
gnarf
O.K - thanks for your help but I've moved the goalposts (closer to my real world problem)- I actually need multiple joins onto the same table.
calumbrodie
@calumbrodie - Regardless of how many join's your doing, so long as you can still group by `ta.id` and only summing the one column, the answer will likely be the same. Can you put the full query in your question?
gnarf
I've updated the original question
calumbrodie
@calumbrodie and I believe I've updated with an answer that should work.
gnarf
@gnarf Thanks so much for your help. The SUM(CASE...) Works beautifully. I'm actually pulling a few ranges from the second table (6 ranges) and this way works great. Thanks again!...
calumbrodie