views:

83

answers:

2

Hi,

I have 2 tables: members, orders.

Members: MemberID, DateCreated
Orders:  OrderID, DateCreated, MemberID

I want to find out the number of new members in a given month broken down into number of order groups, eg. 5+, 4, 3, 2, 1, 0

I've got the query to work out the number of orders for a member but how can I get these values in one query?

SELECT
  COUNT(o.orderid) AS Purchases
FROM
  members m
  LEFT JOIN orders o ON o.memberid = m.memberid
    AND MONTH(o.DateCreated) = 8
WHERE
  MONTH(m.DateCreated) = 8
GROUP BY
  m.memberid
ORDER BY
  COUNT(o.orderid) DESC
A: 

There's a couple ways you can do this, some of which could be fairly complicated.

This is the way I would do, focusing on the new member part rather than the count part:

  SELECT COUNT(M.MemberID),
         (SELECT COUNT(*) FROM Orders O WHERE O.MemberId = M.MemberId AND O.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)) AS num_orders
    FROM Members M
   WHERE M.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)
GROUP BY num_orders

I did the lookup with dates because it would be faster (it could make use of an index whereas MONTH(M.DateCreated) would always do a full table scan, but you can change it back if really do need all orders/members from a given month).

EDIT: I forgot to handle the 5+ part of the question so here's an option for that:

  SELECT COUNT(M.MemberID),
         (SELECT IF(COUNT(*) >= 5, '5+', COUNT(*)) FROM Orders O WHERE O.MemberId = M.MemberId AND O.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)) AS num_orders
    FROM Members M
   WHERE M.DateCreated BETWEEN '2010-08-01' AND DATE_ADD('2010-08-01', INTERVAL 1 MONTH)
GROUP BY num_orders
Rob Van Dam
Thanks Rob, a real big help!
Bill
+1  A: 

You'll need to use sub-queries in the FROM clause, or a series of WITH statements before the main SELECT statement (if your DBMS supports that notation). You'll also need to fix your queries so that you don't report on the people who joined in August 2009 as well as those who joined in August 2010.

Simpler Answer

The 'harder answer' below is the much amended original query, and I've left it because it shows how I developed the answer. The following answer is simpler; it leverages the fact that COUNT(Column) returns 0 if there are no non-null values in the column to be counted.

It uses a table BaseCounts to control which aggregates should appear:

CREATE TEMP TABLE BaseCounts
(
    NumOrders CHAR(2) NOT NULL PRIMARY KEY
);
INSERT INTO BaseCounts VALUES("0 ");
INSERT INTO BaseCounts VALUES("1 ");
INSERT INTO BaseCounts VALUES("2 ");
INSERT INTO BaseCounts VALUES("3 ");
INSERT INTO BaseCounts VALUES("4 ");
INSERT INTO BaseCounts VALUES("5+");

SELECT B.NumOrders, COUNT(N.MemberID) AS NumNewMembers
  FROM BaseCounts AS B LEFT OUTER JOIN
       (SELECT MemberID, CASE WHEN NumOrders < 5
                         THEN CAST(NumOrders AS CHAR(2))
                         ELSE "5+" END AS NumOrders
          FROM (SELECT M.MemberID, COUNT(O.OrderID) AS NumOrders
                  FROM Members AS M LEFT OUTER JOIN Orders AS O
                    ON M.MemberID = O.MemberID AND
                       YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
                 WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
                 GROUP BY M.MemberID
               ) AS NMO
       ) AS N
    ON B.NumOrders = N.NumOrders
 GROUP BY B.NumOrders
 ORDER BY B.NumOrders;

The CREATE TEMP TABLE notation is for IBM Informix Dynamic Server (version 11.50 used for testing). The table vanishes at the end of the session (or when explicitly dropped), and it is private to a session. It could be a permanent base table (drop the keyword TEMP) instead.

The sub-query tagged NMO (for new member orders) is very important. The filter condition on O.DateCreated must appear in the ON clause and not in the WHERE clause; otherwise, you do not get the zero counts that are needed. The COUNT(Column) notation is used twice.

The explanation in the harder answer shown below will help you understand details not explained in this simpler answer. Although 'simpler', I would not regard it as 'simple'. The whole answer shows the importance of iterating your design; I could not readily have produced the simpler answer without having gone through the effort of producing the harder one.


Harder Answer

This was the original development of the answer. I believe it might still have its uses to show how I approached the problem. With this as a basis, it was a relatively simple matter to remove the extra material when developing the simpler answer above.

Counting the zeroes is also surprisingly hard, as is doing '5+'. So, let's tackle it in stages.

New members with 1 or more purchases

SELECT M.MemberID, COUNT(*) AS NumOrders
  FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
 WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
   AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
 GROUP BY M.MemberID

Call that list 'NZO' (for 'non-zero orders'). Note that a LEFT OUTER JOIN would assign people to group '1' even if they placed no orders - not the desired result.

New members with 0 purchases

SELECT M.MemberID, 0 AS NumOrders
  FROM Members AS M
 WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
   AND NOT EXISTS (SELECT * FROM Orders AS O
                    WHERE YEAR(O.DateCreated) = 2010
                      AND MONTH(O.DateCreated) = 8
                      AND O.MemberID = M.MemberID
                  )

That is a nasty query because of the correlated sub-query, but it avoids referencing NZO. An alternative would be to find the list of members who joined in the reference month, and subtract from that the list of members with 1 or more orders (NZO).

Call that list 'WZO' (for 'with zero orders').

Clearly, NZO and WZO have no members in common - the UNION or UNION ALL of these gives a list of new members and the number of orders they placed.

New members in the six categories

SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
  FROM WZO
UNION
SELECT MemberID, CASE WHEN NumOrders < 5 THEN CAST(NumOrders AS CHAR(2))
                 ELSE "5+" END AS NumOrders
  FROM NZO

The casts fix a problem with the types here - the NumOrders column is a numeric type and the result needs to be a string.

Call this list NMC (new members in categories).

Summarize the results

SELECT NumOrders, COUNT(*) AS NumNewMembers
  FROM NMC
 GROUP BY NumOrders
 ORDER BY NumOrders;

Penultimate query

Assembling the various bits and pieces above - and getting the right bits in the right places - yields the following query:

SELECT NumOrders, COUNT(*) AS NumNewMembers
  FROM (SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
          FROM (SELECT M.MemberID, 0 AS NumOrders
                  FROM Members AS M
                 WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
                   AND NOT EXISTS (SELECT * FROM Orders AS O
                                    WHERE YEAR(O.DateCreated) = 2010
                                      AND MONTH(O.DateCreated) = 8
                                      AND O.MemberID = M.MemberID
                                  )
               ) AS WZO
        UNION
        SELECT MemberID, CASE WHEN NumOrders < 5
                         THEN CAST(NumOrders AS CHAR(2))
                         ELSE "5+" END AS NumOrders
          FROM (SELECT M.MemberID, COUNT(*) AS NumOrders
                  FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
                 WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
                   AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
                 GROUP BY M.MemberID
               ) AS NZO
       ) AS NMC
 GROUP BY NumOrders
 ORDER BY NumOrders;

That complete query has been run successfully against IBM Informix Dynamic Server 11.50. For the sample data I generated (see below), I got the result:

numorders   numnewmembers
CHAR(2)     DECIMAL(15,0)
0           1
1           1
2           1
3           1
4           1
5+          2

The general scheme of how to build up the query piecemeal should help you with designing your own queries in future. In particular, you can validate the different segments of the query as you go.

You might find it easier to work the date by creating the first and last days in the month you are interested in, and then running the query for those ranges - which is also more flexible because it could do quarters or half-months or periods straddling two months.

Also note that if there are no new members who place, say, 2 orders in the month they join, then there will be no row in the result. It is possible to fix that problem - it is not easy to fix that problem.

Dealing with 'No new members made N purchases'

There are probably several ways to get a row with a zero count for missing items. The technique I tend to use is to create a table which contains the rows that I want to appear, something like this - where I've created temporary tables to hold the result of each of the named expressions in the main part of the answer. This is a variant of the BaseCounts table shown in the simpler answer; that version did not need the NumNewMembers column whereas this version does.

CREATE TEMP TABLE BaseCounts
(
    NumOrders CHAR(2) NOT NULL,
    NumNewMembers DECIMAL(15,0) NOT NULL
);

INSERT INTO BaseCounts VALUES("0 ", 0);
INSERT INTO BaseCounts VALUES("1 ", 0);
INSERT INTO BaseCounts VALUES("2 ", 0);
INSERT INTO BaseCounts VALUES("3 ", 0);
INSERT INTO BaseCounts VALUES("4 ", 0);
INSERT INTO BaseCounts VALUES("5+", 0);

SELECT NumOrders, MAX(NumNewMembers) AS NumNewMembers
  FROM (SELECT * FROM BaseCounts
        UNION
        SELECT NumOrders, COUNT(*) AS NumNewMembers
          FROM NMC
         GROUP BY NumOrders
        )
 GROUP BY NumOrders
 ORDER BY NumOrders;

The second query in the UNION in the FROM clause is the previous 'final' answer, using a temporary table for the intermediate results.

Ultimate query

When written out to avoid the temp table, the query becomes:

SELECT NumOrders, MAX(NumNewMembers)
  FROM (SELECT * FROM BaseCounts
        UNION
        SELECT NumOrders, COUNT(*) AS NumNewMembers
          FROM (SELECT MemberID, CAST(NumOrders AS CHAR(2)) AS NumOrders
                  FROM (SELECT M.MemberID, 0 AS NumOrders
                          FROM Members AS M
                         WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
                           AND NOT EXISTS (SELECT * FROM Orders AS O
                                            WHERE YEAR(O.DateCreated) = 2010
                                              AND MONTH(O.DateCreated) = 8
                                              AND O.MemberID = M.MemberID
                                          )
                       ) AS WZO
                UNION
                SELECT MemberID, CASE WHEN NumOrders < 5
                                 THEN CAST(NumOrders AS CHAR(2))
                                 ELSE "5+" END AS NumOrders
                  FROM (SELECT M.MemberID, COUNT(*) AS NumOrders
                          FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
                         WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
                           AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
                         GROUP BY M.MemberID
                       ) AS NZO
               ) AS NMC
         GROUP BY NumOrders
       )
 GROUP BY NumOrders
 ORDER BY NumOrders;

When using the modified data set, I get the result:

NumOrders    NumNewMembers
CHAR(2)      DECIMAL(15,0)
0            1
1            1
2            1
3            0
4            2
5+           2

Some DBMS provide other, possibly more convenient, ways to create table values like the BaseCounts table.

One alternative technique that could be considered is some sort of outer join using 'COUNT(column)' instead of 'COUNT(*)'. When you use 'COUNT(column)', the query only counts the rows with a non-null value for 'column', so an outer join that generates a null in 'column' will yield 'COUNT(column)' of zero for the null. However, you still need a reference list from somewhere of the rows that should appear in the output so that you can determine when something is missing from the data set. This is provided by the BaseCounts table in my exposition.

WITH clause

Also, as noted at the top, the SQL standard and some DBMS provide a WITH clause that allows you to create named intermediate results that can then be used in the final query (or, indeed, later on in the WITH clause):

WITH <name1> AS (<query1>),
     <name2>(<named-columns>) AS (<query2>),
     ...
SELECT ... FROM <name1> JOIN <name2> ON ... 

Using this, we could write the following (untested) SQL:

WITH NZO AS (
        SELECT M.MemberID, COUNT(*) AS NumOrders
          FROM Members AS M JOIN Orders AS O ON M.MemberID = O.MemberID
         WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
           AND YEAR(O.DateCreated) = 2010 AND MONTH(O.DateCreated) = 8
         GROUP BY M.MemberID),
     WZO AS (
        SELECT M.MemberID, 0 AS NumOrders
          FROM Members AS M
         WHERE YEAR(M.DateCreated) = 2010 AND MONTH(M.DateCreated) = 8
           AND NOT EXISTS (SELECT * FROM Orders AS O
                            WHERE YEAR(O.DateCreated) = 2010
                              AND MONTH(O.DateCreated) = 8
                              AND O.MemberID = M.MemberID
                          )),
     NMC AS (
        SELECT MemberID, CAST(NumOrders AS CHAR(2))
          FROM WZO
        UNION
        SELECT MemberID, CASE WHEN NumOrders < 5
                         THEN CAST(NumOrders AS CHAR(2))
                         ELSE "5+" END AS NumOrders
          FROM NZO),
     NZC AS (
        SELECT NumOrders, COUNT(*) AS NumNewMembers
          FROM NMC
         GROUP BY NumOrders)
SELECT NumOrders, MAX(NumNewMembers)
  FROM (SELECT * FROM NZC
        UNION
        SELECT * FROM BaseCounts
       )
 GROUP BY NumOrders
 ORDER BY NumOrders;

Sample Data

Tables

CREATE TABLE Members
(
    MemberID INTEGER NOT NULL PRIMARY KEY,
    DateCreated DATE NOT NULL
);

CREATE TABLE Orders
(
    OrderID INTEGER NOT NULL PRIMARY KEY,
    DateCreated DATE NOT NULL,
    MemberID INTEGER NOT NULL REFERENCES Members
);

Members

INSERT INTO Members VALUES(1, '2009-08-03');
INSERT INTO Members VALUES(2, '2010-08-03');
INSERT INTO Members VALUES(3, '2010-08-05');
INSERT INTO Members VALUES(4, '2010-08-13');
INSERT INTO Members VALUES(5, '2010-08-15');
INSERT INTO Members VALUES(6, '2010-08-23');
INSERT INTO Members VALUES(7, '2010-08-23');
INSERT INTO Members VALUES(8, '2010-08-23');
INSERT INTO Members VALUES(9, '2010-09-03');

Orders

INSERT INTO Orders VALUES(11, '2010-08-03', 1);

INSERT INTO Orders VALUES(33, '2010-08-03', 3);

INSERT INTO Orders VALUES(44, '2010-08-05', 4);
INSERT INTO Orders VALUES(45, '2010-08-06', 4);

INSERT INTO Orders VALUES(56, '2010-08-11', 5);
INSERT INTO Orders VALUES(57, '2010-08-13', 5);
INSERT INTO Orders VALUES(58, '2010-08-23', 5);
--For testing 0 members with 3 orders (and 2 with 4 orders), add:
--INSERT INTO Orders VALUES(51, '2010-08-09', 5);

INSERT INTO Orders VALUES(61, '2010-08-05', 6);
INSERT INTO Orders VALUES(62, '2010-08-15', 6);
INSERT INTO Orders VALUES(63, '2010-08-15', 6);
INSERT INTO Orders VALUES(64, '2010-08-25', 6);

INSERT INTO Orders VALUES(71, '2010-08-03', 7);
INSERT INTO Orders VALUES(72, '2010-08-03', 7);
INSERT INTO Orders VALUES(73, '2010-08-03', 7);
INSERT INTO Orders VALUES(74, '2010-08-03', 7);
INSERT INTO Orders VALUES(75, '2010-08-03', 7);

INSERT INTO Orders VALUES(81, '2010-08-03', 8);
INSERT INTO Orders VALUES(82, '2010-08-03', 8);
INSERT INTO Orders VALUES(83, '2010-08-03', 8);
INSERT INTO Orders VALUES(84, '2010-08-03', 8);
INSERT INTO Orders VALUES(85, '2010-08-03', 8);
INSERT INTO Orders VALUES(86, '2010-08-03', 8);

INSERT INTO Orders VALUES(91, '2010-09-03', 9);
Jonathan Leffler
Thank you Jonathan. That's a lot of detail and as I get through these reports I'm doing I'll probably come back to this..
Bill