views:

196

answers:

2

I'm trying to create what I think is a relatively basic report for an online store, using MySQL 5.1.45

The store can receive payment in multiple currencies. I have created some sample tables with data and am trying to generate a straightforward tabular result set grouped by date and currency so that I can graph these figures.

I want to see each currency that is available per date, with a 0 in the result if there were no sales in that currency for that day. If I can get that to work I want to do the same but also grouped by product id.

In the sample data I have provided there are only 3 currencies and 2 product ids, but in practice there can be any number of each.

I can correctly group by date, but then when I add a grouping by currency my query does not return what I want.

I based my work off this article.

My reporting query, grouped only by date:

SELECT calendar.datefield AS date,
   IFNULL(SUM(orders.order_value),0) AS total_value
FROM orders 
RIGHT JOIN calendar ON (DATE(orders.order_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(order_date)) FROM orders) AND  (SELECT MAX(DATE(order_date)) FROM orders))
GROUP BY date

Now grouped by date and currency:

SELECT calendar.datefield AS date, orders.currency_id,
   IFNULL(SUM(orders.order_value),0) AS total_value
FROM orders 
RIGHT JOIN calendar ON (DATE(orders.order_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(order_date)) FROM orders) AND (SELECT MAX(DATE(order_date)) FROM orders))
GROUP BY date, orders.currency_id

The results I am getting (grouped by date and currency):

+------------+-------------+-------------+
| date       | currency_id | total_value |
+------------+-------------+-------------+
| 2009-08-15 |           3 |       81.94 |
| 2009-08-15 |          45 |       25.00 |
| 2009-08-15 |          49 |      122.60 |
| 2009-08-16 |        NULL |        0.00 |
| 2009-08-17 |          45 |       25.00 |
| 2009-08-17 |          49 |      122.60 |
| 2009-08-18 |           3 |       81.94 |
| 2009-08-18 |          49 |      245.20 |
+------------+-------------+-------------+

The results I want:

+------------+-------------+-------------+
| date       | currency_id | total_value |
+------------+-------------+-------------+
| 2009-08-15 |           3 |       81.94 |
| 2009-08-15 |          45 |       25.00 |
| 2009-08-15 |          49 |      122.60 |
| 2009-08-16 |           3 |        0.00 |
| 2009-08-16 |          45 |        0.00 |
| 2009-08-16 |          49 |        0.00 |
| 2009-08-17 |           3 |        0.00 |
| 2009-08-17 |          45 |       25.00 |
| 2009-08-17 |          49 |      122.60 |
| 2009-08-18 |           3 |       81.94 |
| 2009-08-18 |          45 |        0.00 |
| 2009-08-18 |          49 |      245.20 |
+------------+-------------+-------------+

The schema and data I am using in my tests:

CREATE TABLE orders
(
  id INT PRIMARY KEY AUTO_INCREMENT,
  order_date DATETIME,
  order_id INT,
  product_id INT,
  currency_id INT,
  order_value DECIMAL(9,2),
  customer_id INT
);
INSERT INTO orders (order_date, order_id, product_id, currency_id, order_value, customer_id)
  VALUES
  ('2009-08-15 10:20:20', '123', '1', '45', '12.50', '322'),
  ('2009-08-15 12:30:20', '124', '1', '49', '122.60', '400'),
  ('2009-08-15 13:41:20', '125', '1', '3', '40.97', '324'),  
  ('2009-08-15 10:20:20', '126', '2', '45', '12.50', '345'),
  ('2009-08-15 13:41:20', '131', '2', '3', '40.97', '756'),

  ('2009-08-17 10:20:20', '3234', '1', '45', '12.50', '1322'),
  ('2009-08-17 10:20:20', '4642', '2', '45', '12.50', '1345'),
  ('2009-08-17 12:30:20', '23', '2', '49', '122.60', '3142'),

  ('2009-08-18 12:30:20', '2131', '1', '49', '122.60', '4700'),
  ('2009-08-18 13:41:20', '4568', '1', '3', '40.97', '3274'),  
  ('2009-08-18 12:30:20', '956', '2', '49', '122.60', '3542'),
  ('2009-08-18 13:41:20', '443', '2', '3', '40.97', '7556');

CREATE TABLE currency
  (
    id INT PRIMARY KEY,
    name VARCHAR(255)
  );
INSERT INTO currency (id, name)
  VALUES
  (3, 'Euro'),
  (45, 'US Dollar'),
  (49, 'CA Dollar');


CREATE TABLE calendar (datefield DATE);

  DELIMITER |
  CREATE PROCEDURE fill_calendar(start_date DATE, end_date DATE)
  BEGIN
    DECLARE crt_date DATE;
    SET crt_date=start_date;
    WHILE crt_date < end_date DO
      INSERT INTO calendar VALUES(crt_date);
      SET crt_date = ADDDATE(crt_date, INTERVAL 1 DAY);
    END WHILE;
  END |
  DELIMITER ;

CALL fill_calendar('2008-01-01', '2011-12-31');
+1  A: 

You're going to find it hard to get the results you want there unless you put a dummy order into the system for every currency on every day (which could be easily done in the fill_calendar routine for testing).

Right now, what you want is to join calendar, orders and currency using a common link; but there isn't such a link (you've links from calendar to orders and orders to currency but nothing from calendar to currency).

If you created those dummy orders, then you wouldn't need to alter the schema; the data itself would provide the link needed. Otherwise, you'd probably have to alter the schema a bit.

Mark Dennehy
+1  A: 

I'm posting this as an answer as it may be quite large for a comment. Thanks to Mark for pointing me in the right direction. Mark's answer worked but meant a schema change to the calendar table which I wasn't keen on, as I may need the report to be more flexible in future (e.g. group by *product_id*)

This works - however it may not be elegant. I'll leave this question "unanswered" for a few days to see if anyone can come up with a better solution.

Additional schema and data (addition of a products table):

CREATE TABLE products
  (
    id INT PRIMARY KEY,
    name VARCHAR(255)
  );
INSERT INTO products (id, name)
  VALUES
  (1, 'Widget'),
  (2, 'Midget'),
  (3, 'Gidget');

Now using this query I get the answer I want:

SELECT cal.date AS date, currency.name AS currency, products.name AS product,
   IFNULL(SUM(orders.order_value),0) AS total_value
FROM orders 
RIGHT JOIN 
(
SELECT cal.datefield AS date, cur.id AS currency, prod.id AS product
FROM calendar cal
CROSS JOIN currency cur
CROSS JOIN products prod
) cal
 ON (DATE(orders.order_date) = cal.date)
    AND orders.currency_id = cal.currency
    AND orders.product_id = cal.product
JOIN currency ON cal.currency = currency.id
JOIN products ON cal.product = products.id
WHERE (cal.date BETWEEN (SELECT MIN(DATE(order_date)) FROM orders) AND (SELECT MAX(DATE(order_date)) FROM orders))
GROUP BY date, cal.currency,cal.product

This gives me all data points for all days, or zero if they don't exist.

+------------+-----------+--------+-------------+
| date       | currency  | product| total_value |
+------------+-----------+--------+-------------+
| 2009-08-15 | Euro      | Widget |       40.97 |
| 2009-08-15 | Euro      | Midget |       40.97 |
| 2009-08-15 | Euro      | Gidget |        0.00 |
| 2009-08-15 | US Dollar | Widget |       12.50 |
| 2009-08-15 | US Dollar | Midget |       12.50 |
| 2009-08-15 | US Dollar | Gidget |        0.00 |
| 2009-08-15 | CA Dollar | Widget |      122.60 |
| 2009-08-15 | CA Dollar | Midget |        0.00 |
| 2009-08-15 | CA Dollar | Gidget |        0.00 |
| 2009-08-16 | Euro      | Widget |        0.00 |
| 2009-08-16 | Euro      | Midget |        0.00 |
| 2009-08-16 | Euro      | Gidget |        0.00 |
| 2009-08-16 | US Dollar | Widget |        0.00 |
| 2009-08-16 | US Dollar | Midget |        0.00 |
| 2009-08-16 | US Dollar | Gidget |        0.00 |
| 2009-08-16 | CA Dollar | Widget |        0.00 |
| 2009-08-16 | CA Dollar | Midget |        0.00 |
| 2009-08-16 | CA Dollar | Gidget |        0.00 |
| 2009-08-17 | Euro      | Widget |        0.00 |
| 2009-08-17 | Euro      | Midget |        0.00 |
| 2009-08-17 | Euro      | Gidget |        0.00 |
| 2009-08-17 | US Dollar | Widget |       12.50 |
| 2009-08-17 | US Dollar | Midget |       12.50 |
| 2009-08-17 | US Dollar | Gidget |        0.00 |
| 2009-08-17 | CA Dollar | Widget |        0.00 |
| 2009-08-17 | CA Dollar | Midget |      122.60 |
| 2009-08-17 | CA Dollar | Gidget |        0.00 |
| 2009-08-18 | Euro      | Widget |       40.97 |
| 2009-08-18 | Euro      | Midget |       40.97 |
| 2009-08-18 | Euro      | Gidget |        0.00 |
| 2009-08-18 | US Dollar | Widget |        0.00 |
| 2009-08-18 | US Dollar | Midget |        0.00 |
| 2009-08-18 | US Dollar | Gidget |        0.00 |
| 2009-08-18 | CA Dollar | Widget |      122.60 |
| 2009-08-18 | CA Dollar | Midget |      122.60 |
| 2009-08-18 | CA Dollar | Gidget |        0.00 |
+------------+-----------+--------+-------------+

This uses a JOIN on a subquery which I think is not very performant, but it works on this small dataset - I will generate some more data and see how it goes.

Shane O'Grady