views:

53

answers:

2

Hi, Hopefully this isn't a dupe of another question, but I couldn't see it anywhere else - also this is a simplified version of another question I asked, hopefully to get me started on working out how to approach it.

I am looking to work out consecutive ranges of payments where there has been at least one payment in each month.

I have the following sample data

CREATE TABLE #data
(
Contact_reference NVARCHAR(55)
,Date_payment DATETIME
,Payment_value MONEY
)

INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2003-06-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2004-06-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2004-12-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-04-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-05-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-06-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-07-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-08-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-09-08',12.82)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-10-10',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-11-10',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2005-12-10',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2006-01-10',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2006-02-10',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2006-02-28',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2006-04-12',12.8205)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2006-05-10',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-06-11',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-07-10',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-08-09',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-09-10',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-10-09',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-11-09',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2007-12-10',19.2308)
INSERT INTO #data VALUES ('18EC3CD2-3065-4FF4-BE40-000004228590','2008-01-10',19.2308)

And what I would like to be able to do is to work out for each contact the ranges over which they gave consecutively (defined as giving at least once in every calendar month), the number of consecutive payments, the total value per range (and ideally if possible the gap between the current range and the end of the most recent one).

For the test data above my output would look like this:

CREATE TABLE #results
(
contact_reference NVARCHAR(55)
,Range_start DATETIME
,Range_end DATETIME
,Payments INT
,Value MONEY
,months_until_next_payment INT --works out the gap between the range_end date for a group and the range_start date for the next group
)
INSERT INTO #results VALUES('18EC3CD2-3065-4FF4-BE40-000004228590','2003-06-08','2003-06-08',1,12.82,12)
INSERT INTO #results VALUES('18EC3CD2-3065-4FF4-BE40-000004228590','2004-06-08','2004-06-08',1,12.82,6)
INSERT INTO #results VALUES('18EC3CD2-3065-4FF4-BE40-000004228590','2004-12-08','2004-12-08',1,12.82,4)
INSERT INTO #results VALUES('18EC3CD2-3065-4FF4-BE40-000004228590','2005-04-08','2006-02-28',12,153.843,2)
INSERT INTO #results VALUES('18EC3CD2-3065-4FF4-BE40-000004228590','2006-04-12','2008-06-06',27,416.6673,NULL)

I've looked for answers using islands, or iterations but I quite frankly don't even know where to begin applying them to my question, so any help massively appreciated :)

+1  A: 

You can do it using cursor. Language like c#/java are better choice for this problem.

DECLARE @date DATETIME
DECLARE @nextDate DATETIME
DECLARE @rangeStart DATETIME
DECLARE @rangeEnd DATETIME

DECLARE @value decimal(18,2)
DECLARE @valueSum decimal(18,2)
DECLARE @count int

DECLARE @PaymentCursor CURSOR

SET @PaymentCursor = CURSOR FOR
    SELECT Date_payment, Payment_value FROM #data
    ORDER BY Date_payment

OPEN @PaymentCursor
    FETCH NEXT FROM @PaymentCursor INTO @nextDate, @value   
    SET @date = @nextDate
    SET @rangeStart = @nextDate
    SET @valueSum = 0
    SET @count = 0

    WHILE (@@FETCH_STATUS = 0)
    BEGIN       
        FETCH NEXT FROM @PaymentCursor INTO @nextDate, @value

        SET @count = @count + 1
        SET @valueSum = @valueSum + @value      

        IF (DATEDIFF(mm, @date, @nextDate) > 1)
        BEGIN
            SELECT @rangeStart AS RangeStart, @date AS RangeEnd, @count AS Coount, @valueSum AS VALUE, DATEDIFF(mm, @date, @nextDate) AS months_until_next_payment
            SET @valueSum = 0               
            SET @count = 0
            SET @rangeStart = @nextDate
        END

        SET @date = @nextDate           
    END

    SELECT @rangeStart AS RangeStart, @date AS RangeEnd, @count AS Coount, @valueSum AS VALUE, null AS months_until_next_payment

CLOSE @PaymentCursor
DEALLOCATE @PaymentCursor
Branimir
If you have to use a Cursor in SQL, you probably should look at putting the logic into your app instead of in the database.
JNK
@JNK Yes, I agree.
Branimir
@Branimir - thanks for this, but it produces each row as a separate result table (and I want all of them together for easier reference) and it seems to be running very slow (took around 6 minutes to pull 10 records, and I have a good few hundred thousand).
Davin
You can insert result rows in a table, create a table and use INSERT INTO instead of SELECT in WHILE. Performance depends on your data and hardware and using CURSOR couse performance penalty...
Branimir
+3  A: 

Edit: I've added in the months_until_next_payment column. This would be more efficiently done in the application rather than with a self join however as SQL Server does not have any particularly satisfactory way of referencing next and previous rows.

;WITH base AS ( 
SELECT    Contact_reference  ,
          Payment_value,
          DATEPART(YEAR, Date_payment)*12 + DATEPART(MONTH, Date_payment) - 
               DENSE_RANK() OVER 
                   (PARTITION BY Contact_reference 
                    ORDER BY DATEPART(YEAR, Date_payment)*12 + DATEPART(MONTH, Date_payment)) AS G,
          Date_payment
 FROM     #data
 ),
 cte AS
 (
 SELECT 
          Contact_reference, 
          ROW_NUMBER() over (partition by Contact_reference 
                                 order by MIN(Date_payment)) RN,
          MIN(Date_payment) Range_start,
          MAX(Date_payment) Range_end, 
          COUNT(Payment_value) Payments, 
          SUM(Payment_value) Value
 FROM base
 GROUP BY Contact_reference, G
 )
 SELECT 
       c1.Contact_reference, 
       c1.Payments, 
       c1.Range_end, 
       c1.Range_start, 
       c1.Value, 
       DATEDIFF(month, c1.Range_end,c2.Range_start) months_until_next_payment
 FROM cte c1
 LEFT join cte c2 ON c1.Contact_reference=c2.Contact_reference and c2.RN = c1.RN+1
Martin Smith
Sorry for the sheer ignorance of this, but why do you do DATEPART(year) * 12? and also if you could possibly let me know how to begin doing the left join for the months_until_next payment that would be fantastic, as I am a proper n00b with SQL! :) Thanks!
Davin
Instead of another join, you could just add a computed column on the final select statement, altering it to read: SELECT Contact_reference, MIN(Date_payment) Range_start,MAX(Date_payment) Range_end, COUNT(Payment_value) Payments, SUM(Payment_value) Value , (select MIN(G) - q.g + 1 from cte1 where g > q.g) as months_until_next_payment FROM cte1 q GROUP BY Contact_reference, G
Richard Fawcett
DATEPART(year) * 12 + DATEPART(month) is used to get an integer which changes by one every time we advance a calendar month. It's because there are 12 years in a month, when we reset to January, we subtract 11 for the change in month, and add 1*12 for the change in year, meaning the net result is adding one.
Richard Fawcett
Wow, this answer is amazing! Thanks so much for this :)
Davin