views:

90

answers:

4

Hi, I am looking to set up 6 groups into which customers would fall into:

Non-purchaser (never bought from us)
New purchaser (purchased for the first time within the current financial year)
Reactivated purchaser (purchased in the current financial year, and also in the 2nd most recent year)
Lapsed purchaser (purchased in the prior financial year but not the current one)
2 yr Consecutive purchaser (has purchased in the current financial year and the most recent one)
3-4 yr consecutive purchaser (has purchased in every year for the last 3 or 4 financial years)
5+ year consecutive purchaser (has purchased in every financial year for a minimum of 5 years)

The financial year I would be using would be from 1st april to 31st march, and would use the following tables:

purchaser (including id (primary key))

purchases (date_purchased, purchases_purchaser_id)

Where the tables are joined on purchaser_id = purchases_purchaser_id and each purchaser can have multiple purchases withn any financial year (so could presumably be grouped by year as well)

It's been driving me mad so any help would be majorly appreciated!!!

Thanks, Davin

A: 

Take small steps at a time

  • Refactor the logic into one or more user defined functions (e.g. CREATE function dbo.GetConsumerGroup(@purchaser_id). If the logic to determine each classification is difficult, refactor these into smaller functions as well, e.g. something like dbo.HasPurchaserMadePurchaseBetween(@DateFrom, @DateTo) can be used for most of your groups.
  • Order of precedence is going to be important - you will probably keep flags for each of the last 5 years groups and use these, e.g. All true = 5 year, last one, last two + last 3-4 = 3-4 year consecutive purchaser etc.

The downside is that using this UDF in a where clause is likely to result in abysmal performance. In which case you will need to schedule this calculation for your purchasers at an appropriate time and then persist it

nonnb
Unfortunately I'm not able to create UDFs as I don't have permission, and so even if there could be one created that would work efficiently this isn't something I could really use...I am considering created a series of temporary tables for each of the groups and then joing this onto the main table - I presume that this would also have a horrendous run time?
Davin
That's procedural approach and should be avoided in queries. You should think how to process sets not rows.
Niikola
+1  A: 

Although it COULD be done a bit easier with another table of date ranges showing the 5 fiscal years, I have hard-coded the from/to date references for your query and appears to be working...

The INNER Select will pre-gather a "flag" based on any 1 or more purchase within the given date range... ex: Apr 1, 2010 = "20100401" for date conversion to Mar 31, 2011 = "20110331", and cycle through last 5 years... Additionally, a flag to count for ANY with a date purchase within the actual purchases table to confirm a "never purchased" vs someone purchasing 6, 7 or older years history...

That queries' basis will basically create a cross-tab of possible individual years where activity has occurred. I can then query with the most detailed criteria for some caption of their classification down to the least...

I converted from another SQL language as best as possible to comply with SQL-Server syntax (mostly about the date conversion), but otherwise, the principle and queries do work... The final classification column is character, but can be whatever you want to supercede.

SELECT
      id,
      CASE 
         WHEN year1 + year2 + year3 + year4 + year5 = 5 THEN "5+yrs "
         WHEN year1 + year2 + year3 + year4 >= 3 THEN "3-4yrs"
         WHEN year1 + year2 = 2, "2yrs  "
         WHEN year1 = 1 AND year2 = 0 AND year3 = 1 THEN "Reacti"
         WHEN year1 = 1 THEN "New   "
         WHEN year1 = 0 AND year2 = 1 THEN "Lapsed"
         WHEN AnyPurchase = 1, "over5"
         ELSE "never" BuyerClassification
      END
   FROM
      ( SELECT
            id,
            MAX( CASE WHEN date_purchased >= CONVERT( Date, "20100401", 112 ) 
                       AND date_purchased <= CONVERT( Date, "20110331", 112 ) 
                 THEN 1 ELSE 0 END ) Year1,
            MAX( CASE WHEN date_purchased >= CONVERT( Date, "20090401", 112 ) 
                       AND date_purchased <= CONVERT( Date, "20100331", 112 )
                 THEN 1 ELSE 0 END ) Year2,
            MAX( CASE WEHEN date_purchased >= CONVERT( Date, "20080401", 112 ) 
                       AND date_purchased <= CONVERT( Date, "20090331", 112 )
                 THEN 1 ELSE 0 END ) Year3,
            MAX( CASE WHEN date_purchased >= CONVERT( Date, "20070401", 112 ) 
                       AND date_purchased <= CONVERT( Date, "20080331", 112 )
                 THEN 1 ELSE 0 END ) Year4,
            MAX( CASE WHEN date_purchased >= CONVERT( Date, "20060401", 112 ) 
                       AND date_purchased <= CONVERT( Date, "20070331", 112 )
                 THEN 1 ELSE 0 END ) Year5,
            MAX( CASE WHEN date_purchased <= CONVERT( Date, "20100401", 112 )
                 THEN 1 ELSE 0 END ) AnyPurchase
         FROM
            purchaser LEFT OUTER JOIN purchases
               ON purchaser.id = purchases.purchases_purchaser_id
         GROUP BY 
            1 ) PreGroup1

EDIT -- fixed parens via syntax conversion and missed it...

The "Group By 1" refers to doing a group by the first column in the query which is the purchaser's ID from the purchaser. By doing a left-outer join will guarantee all possible people in the purchasers table regardless of having any actual purchases. The "PreGroup1" is the "alias" of the select statement just in case you wanted to do other joins subsequent in the outer most select where detecting the year values for classification.

Although it will work, but may not be as efficient as others have chimed-in on by doing analysis of the query, it may open your mind to some querying and aggregating techniques. This process is basically creating a sort-of cross-tab by utilization of case/when construct on the inner SQL-Select, and final classification in the OUTER most SQL-Select.

DRapp
Thanks, that looks awesome!sorry for the stupidity of this ask, but what does the GROUP BY 1) pregroup refer to? should this group by purchaser.id instead?Also would it be possible to get the date to be a rolling 5 years instead of hardcoded (for example something like:SET @year = CONVERT(NVARCHAR(7), YEAR(GETDATE()))SET @pyear = CONVERT(NVARCHAR(7), YEAR(GETDATE())) - 1SET @yr_start = CAST('01/04/'+ @pyear AS DATETIME)SET @yr_end = CAST('31/03/'+ @year AS DATETIME)As I'm new to SQL not sure if this is nonsense though!Massive thanks again,Davin :)
Davin
- your For BuyerClassification condition has to be modified (does not reflect initial conditions)- Your method performs 10 table scans of purchases table, which is not efficient if you have huge number of rows- syntax is not correct: some THEN clause missing, Alias BuyerClassification should go after END clause, not before, Max should have closing parenthesis (between END and column alias).
Niikola
TIME/IO Comparison (1000000 rows in purchases):DRAppTable 'purchases'. Scan count 10, logical reads 5774, ...Table 'purchasers'. Scan count 1, logical reads 2, ...CPU time = 1669 ms, elapsed time = 1678 ms.Niikola:Table 'Worktable'. Scan count 0, logical reads 0, ...Table 'purchases'. Scan count 1, logical reads 2598, ...Table 'purchasers'. Scan count 1, logical reads 2, ...CPU time = 827 ms, elapsed time = 798 ms.
Niikola
+1  A: 

MS SQL Server (works on 2000, 2005, 2008)

SET NOCOUNT ON

CREATE TABLE #purchasers (purchaser_id int, pName varchar(20))

Insert Into #purchasers values (0, 'Non-purchaser')
Insert Into #purchasers values (1, 'New purchaser')
Insert Into #purchasers values (2, 'Reactivated')
Insert Into #purchasers values (3, 'Lapsed')
Insert Into #purchasers values (4, '2 yr Consecutive')
Insert Into #purchasers values (5, '3 yr consecutive')
Insert Into #purchasers values (6, '4 yr consecutive')
Insert Into #purchasers values (7, '5+ year consecutive')
Insert Into #purchasers values (8, 'Uncategorized')
Insert Into #purchasers values (9, 'old one')


CREATE TABLE #purchases (date_purchased datetime, purchases_purchaser_id int)

Insert Into #purchases values ('2010/05/03', 1)

Insert Into #purchases values ('2007/05/03', 2)
Insert Into #purchases values ('2008/05/03', 2)
Insert Into #purchases values ('2010/05/03', 2)

Insert Into #purchases values ('2008/05/03', 3)
Insert Into #purchases values ('2009/05/03', 3)


Insert Into #purchases values ('2009/05/03', 4)
Insert Into #purchases values ('2010/05/03', 4)

Insert Into #purchases values ('2008/05/03', 5)
Insert Into #purchases values ('2009/05/03', 5)
Insert Into #purchases values ('2010/05/03', 5)

Insert Into #purchases values ('2007/05/03', 6)
Insert Into #purchases values ('2008/05/03', 6)
Insert Into #purchases values ('2009/05/03', 6)
Insert Into #purchases values ('2010/05/03', 6)

Insert Into #purchases values ('2004/05/03', 7)
Insert Into #purchases values ('2005/05/03', 7)
Insert Into #purchases values ('2006/05/03', 7)
Insert Into #purchases values ('2007/05/03', 7)
Insert Into #purchases values ('2008/05/03', 7)
Insert Into #purchases values ('2009/05/03', 7)
Insert Into #purchases values ('2009/05/03', 7)
Insert Into #purchases values ('2009/05/03', 7)
Insert Into #purchases values ('2010/05/03', 7)

Insert Into #purchases values ('2007/05/03', 8)

Insert Into #purchases values ('2000/05/03', 9)

Select p.pName,
       p.purchaser_id,
       isNull(a.[2005],0) as [Bef.2006],
       isNull(a.[2006],0) as [2006],
       isNull(a.[2007],0) as [2007],
       isNull(a.[2008],0) as [2008],
       isNull(a.[2009],0) as [2009],
       isNull(a.[2010],0) as [2010],
       isNull(a.Category, 'Non-purchaser') as Category
From #purchasers p
Left Join 
(        
   Select purchases_purchaser_id, [2005],[2006],[2007],[2008],[2009],[2010],
         Case When [2006]+[2007]+[2008]+[2009]+[2010] = 5 Then '5+ year consecutive'
              When [2008]+[2009]+[2010] = 3               Then '3-4 yr consecutive'
              When [2009]+[2010] = 2                      Then '2 yr Consecutive'
              When [2009]=1 and [2010]=0                  Then 'Lapsed'
              When [2008]=1 and [2009]=0 and [2010]=1     Then 'Reactivated'
              When [2006]+[2007]+[2008]+[2009]=0 and [2010]=1 Then 'New'
              When [2006]+[2007]+[2008]+[2009]+[2010] = 0 Then 'Non-purchaser in last 5 yrs'
              Else 'non categorized'
         End as Category     
   From (
            Select purchases_purchaser_id, 
                   Case When date_purchased < '2006/04/01' Then 2005 
                        Else Year(date_purchased)- Case When month(date_purchased)<4 Then -1 else 0 end 
                     end as fiscalYear, count(*) as nPurchases
            From #purchases 
            Group by purchases_purchaser_id,
                     Case When date_purchased < '2006/04/01' Then 2005 
                        Else Year(date_purchased)- Case When month(date_purchased)<4 Then -1 else 0 end 
                     end
         ) as AggData
   PIVOT ( count(nPurchases) for fiscalYear in ([2005],[2006],[2007],[2008],[2009],[2010]) ) pvt
) as a
on p.purchaser_id=a.purchases_purchaser_id
Niikola
Thanks so much for this :)I'm afraid I'm a bit confused as to how I'd actually implement this, as the data pulled all refers to the temporary tables which have been prepopulated? Would I not need to actually create the temporary tables when redirecting this to my actual data source?Sorry if that's a stupid ask!
Davin
Those temporary tables just simulate your data. You should change #tablenames with your real table names. I created those tables and populated data just to make testing easier - you can copy whole code and execute it to see if query returns what you expect. BTW. there are some situations that are not covered, and i marked them as "uncategorized".
Niikola
There's also difference between Purchaser who never purchased and who didn't purchase in last 5 years. If you want to cover just current and previous 4 years (ignoring everything earlier), you can make query faster adding WHERE clause after FROM #purchases: WHERE date_purchased >= '2006/04/01'. For query below it would be WHERE date_purchased >= @OlderThan5yrs. In both queries you can remove columns [2005] and [5].
Niikola
+2  A: 

Here is dynamic version

Declare @currentYear int
Declare @OlderThan5yrs datetime 

Set @currentYear  = Year(GetDate()) - Case When month(GetDate())<4 then 1 else 0 end
Set @OlderThan5yrs = cast(cast( @currentYear-5 as varchar(4))+'/04/01' as datetime)

Select p.pName,
       p.purchaser_id,
       isNull(a.[5+YrAgo],0) as [5+YrAgo],
       isNull(a.[4YrAgo], 0)  as [4YrAgo],
       isNull(a.[3YrAgo], 0)  as [3YrAgo],
       isNull(a.[2YrAgo], 0)  as [2YrAgo],
       isNull(a.[1YrAgo], 0)  as [1YrAgo],
       isNull(a.[CurYr],  0)   as [CurYr],
       isNull(a.Category, 'Non-purchaser (ever)') as Category
From purchasers p
Left Join 
(        
   Select purchases_purchaser_id, 
          [5] as [5+YrAgo],
          [4] as [4YrAgo],
          [3] as [3YrAgo],
          [2] as [2YrAgo],
          [1] as [1YrAgo],
          [0]  as [CurYr],
         Case When [4]+[3]+[2]+[1]+[0] = 5     Then '5+ year consecutive'
              When [2]+[1]+[0] = 3             Then '3-4 yr consecutive'
              When [1]+[0] = 2                 Then '2 yr Consecutive'
              When [1]=1 and [0]=0             Then 'Lapsed'
              When [2]=1 and [1]=0 and [0]=1   Then 'Reactivated'
              When [4]+[3]+[2]+[1]=0 and [0]=1 Then 'New'
              When [4]+[3]+[2]+[1]+[0] = 0     Then 'Non-purchaser (last 5 yrs)'
              Else 'non categorized'
         End as Category     
   From (
            Select purchases_purchaser_id, 
                   Case When date_purchased < @OlderThan5yrs Then 5 
                        Else @currentYear - Year(date_purchased)+ Case When month(date_purchased)<4 Then 1 else 0 end 
                     end as fiscalYear, count(*) as nPurchases
            From purchases 
            Group by purchases_purchaser_id,
                   Case When date_purchased < @OlderThan5yrs Then 5 
                        Else @currentYear - Year(date_purchased)+ Case When month(date_purchased)<4 Then 1 else 0 end 
                     end
         ) as AggData
   PIVOT ( count(nPurchases) for fiscalYear in ([5],[4],[3],[2],[1],[0]) ) pvt
) as a
on p.purchaser_id=a.purchases_purchaser_id

UPDATED:

Here is result with data I inserted in previous query (You will have to add # to table names in the query).

pName                purchaser_id 5+YrAgo 4YrAgo 3YrAgo 2YrAgo 1YrAgo CurYr Category
-------------------- ------------ ------- ------ ------ ------ ------ ----- --------------------------
Non-purchaser                   0       0      0      0      0      0     0 Non-purchaser (ever)
New purchaser                   1       0      0      0      0      0     1 New
Reactivated                     2       0      0      1      1      0     1 Reactivated
Lapsed                          3       0      0      0      1      1     0 Lapsed
2 yr Consecutive                4       0      0      0      0      1     1 2 yr Consecutive
3 yr consecutive                5       0      0      0      1      1     1 3-4 yr consecutive
4 yr consecutive                6       0      0      1      1      1     1 3-4 yr consecutive
5+ year consecutive             7       1      1      1      1      1     1 5+ year consecutive
Uncategorized                   8       0      0      1      0      0     0 non categorized
old one                         9       1      0      0      0      0     0 Non-purchaser (last 5 yrs)

You also don't need columns [5+YrAgo], [4YrAgo], [3YrAgo], [2YrAgo], [1YrAgo] and [CurYr]. I added them to be easier to check query logic.

UPDATE 2

Below is query you asked in comment. Note table structures I've used in query are:

Table purchasers ( purchaser_id int, pName varchar(20))
Table purchases (purchases_purchaser_id int, date_purchased datetime)

and there is Foreign key on purchases (purchases_purchaser_id) referencing purchases (purchaser_id).

;With AggData as (
Select   purchases_purchaser_id, 
         Case When [4]+[3]+[2]+[1]+[0] = 5     Then 1 end as [Consec5],
         Case When [4]=0 and [2]+[1]+[0] = 3   Then 1 end as [Consec34],
         Case When [2]=0 and [1]+[0] = 2       Then 1 end as [Consec2],
         Case When [1]=1 and [0]=0             Then 1 end as [Lapsed],
         Case When [2]=1 and [1]=0 and [0]=1   Then 1 end as [Reactivated],
         Case When [4]+[3]+[2]+[1]=0 and [0]=1 Then 1 end as [New],
         Case When [4]+[3]+[2]>0 and [1]+[0]=0 Then 1 end as [Uncateg]
   From  (
            Select purchases_purchaser_id, 
                   @currentYear - Year(date_purchased) + Case When month(date_purchased)<4 Then 1 else 0 end as fiscalYear, 
                   count(*) as nPurchases
              From purchases      
             Where date_purchased >= @OlderThan5yrs  
             Group by purchases_purchaser_id,
                      @currentYear - Year(date_purchased) + Case When month(date_purchased)<4 Then 1 else 0 end
         ) as AggData
   PIVOT ( count(nPurchases) for fiscalYear in ([4],[3],[2],[1],[0]) ) pvt
)
Select count([Consec5])     as [Consec5],
       count([Consec34])    as [Consec34],
       count([Consec2])     as [Consec2],
       count([Lapsed])      as [Lapsed],
       count([Reactivated]) as [Reactivated],
       count([New])         as [New],
       count(*)-count(a.purchases_purchaser_id) as [Non],
       count([Uncateg])     as [Uncateg]
  From purchasers p
 Left Join AggData as a
  on p.purchaser_id=a.purchases_purchaser_id              

Result (With test data from previous post)

Consec5 Consec34 Consec2 Lapsed Reactivated New Non Uncateg
------- -------- ------- ------ ----------- --- --- -------
      1        2       1      1           1   1   2       1
Niikola
wow! Thanks yet again for this (didn't see it before my earlier reply to you) For the dynamic version of your query its coming up with an error message 'invalid object name 'purchasers'Also could you clarify why p.purchaser_id=a.purchases_purchaser_id (at the end) is not:p.purchaser_id=a.purchases.purchaser_id instead? As 'purchases' is a separate table entirely? would it throw up errors if you gave the purchases table an alias like ps?As i asked above, if you could clarify how the p.pName would fit into my pre-existing database that would be awesome :)
Davin
As I said before, you have to modify table and column names to fit your schema. If in purchases table purchaser id column name is same (purchaser_id), then just replace all purchases_purchaser_id with purchaser_id (a.purchases_purchaser_id will be a.purchaser_id).
Niikola
pName I added just to be easier to check if we get what we wanted (I inserted data to match purchaser name :-). You can use Purchaser Name column if you have, or you can just remove that column from select.
Niikola
Sorry to keep on asking questions, but how would I then perform a count of the totals for each pName (i.e. so thatinstead of a separate row for each customer, there are only a total of nine rows, 1 for each pName category) - as I;m not used to pivoting I'm not sure where I'd need to add the Count function!!Yet again, massive thanks, this is a true lifesaver :)
Davin
I updated post with totals. By the way, pName is Purchaser Name, not category - I've just put the names to be the same as expected category, but in real life it has to be person/company name (it's in purchasers table).
Niikola
One more note: I still keep category 'uncategorized' which contains all purchasers that don't have purchases in current and previous year but they have at least one in 2,3 or 4 years ago. If you want them to be in 'Lapsed' you should modify condition for it as: [4]+[3]+[2]+[1]>1 and [0]=0 and to remove [Uncateg] column.
Niikola
That is fantastic and works amazingly - your help with all this has been awesome and is truly appreciated, and I would definitely vote you up if I could! :)
Davin