views:

102

answers:

5

Let's assume we have a table Maintenance

Customer LastLogin ActionType
1        12/1/2007 2
1        12/2/2007 2
etc.

We want a list of all customers who at any point during a given year had one or more uninterrupted sequences, 14 days long, of login with action type 2.

I can of course easily do this with code, and even have it be fairly quick over small sets. Is there a non-cursor way to do it in SQL?

Microsoft Sql 2008.

+1  A: 

EDIT: This would have worked for the original question about two in a row. 14 in a row is a different answer

First you need a sequence so you are going to use ROWNUMBER

You can do a SELF-JOIN Maintenace to itself using ROWNUMBER = ROWNUMBER + 1

Any two consequitive rows with the same customer id, and both rows with "2" ActionType are will give you the list CUSTOMER as your answer.

Try this

WITH Maintenance AS
(
SELECT 1 as Customer, CONVERT (DateTime, '1/1/2008') DateTimeStamp, 1 ActionType
UNION
SELECT 1, '3/1/2009', 1
UNION
SELECT 1, '3/1/2006', 2
UNION
SELECT 2, '3/1/2009', 1
UNION
SELECT 2, '3/1/2006', 2
)
,RowNumberMaintenance AS
(SELECT  ROW_NUMBER () OVER (ORDER BY Customer, DateTimeStamp)  AS RowNumber, *
FROM Maintenance)
SELECT m1.Customer
From RowNumberMaintenance M1
    INNER JOIN RowNumberMaintenance M2
     ON M1.Customer = M2.Customer
     AND M1.RowNumber = M2.RowNumber + 1
WHERE 1=1
     AND M1.ActionType <> 2
     AND M2.ActionType <> 2
Raj More
A: 
select customerID, count(customerID)
from maintenance
where actiontype = 2
group by customerID
having count(customerID) >= 1
Jason
+2  A: 

This will select all customers with at least two consecutive actions of the same type.

WITH    rows AS 
        (
        SELECT  customer, action,
                ROW_NUMBER() OVER (PARTITION BY customer ORDER BY lastlogin) AS rn
        FROM    mytable
        )
SELECT  DISTINCT customer
FROM    rows rp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    rows rl
        WHERE   rl.customer = rp.customer
                AND rl.rn = rp.rn + 1
                AND rl.action = rp.action
        )

Here's the more efficient query for just action 2:

WITH    rows AS 
        (
        SELECT  customer, ROW_NUMBER() OVER (PARTITION BY customer ORDER BY lastlogin) AS rn
        FROM    mytable
        WHERE   action = 2
        )
SELECT  DISTINCT customer
FROM    rows rp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    rows rl
        WHERE   rl.customer = rp.customer
                AND rl.rn = rp.rn + 1
        )

Update 2:

To select uninterrupted ranges:

WITH    rows AS 
        (
        SELECT  customer, action, lastlogin
                ROW_NUMBER() OVER (PARTITION BY customer ORDER BY lastlogin) AS rn
                ROW_NUMBER() OVER (PARTITION BY customer, action ORDER BY lastlogin) AS series
        FROM    mytable
        )
SELECT  DISTINCT customer
FROM    (
        SELECT  customer
        FROM    rows rp
        WHERE   action
        GROUP BY
                customer, actioncode, series - rn
        HAVING
                DETEDIFF(day, MIN(lastlogin), MAX(lastlogin)) >= 14
        ) q

This query calculates two series: one returns contiguous ORDER BY lastlogin, the second one partitions by action additionally:

action  logindate rn  series diff = rn - series
1       Jan 01    1   1      0
1       Jan 02    2   2      0
2       Jan 03    3   1      2
2       Jan 04    4   2      2
1       Jan 05    5   3      2
1       Jan 06    6   4      2

As long as the difference between the two schemes is the same, the series are uninterrupted. Each interruption breaks the series.

This means that the combination of (action, diff) defines the uninterrupted groups.

We can group by action, diff, find MAX and MIN within the groups and filter on them.

If you need to select 14 rows rather than 14 consecutive days, just filter on COUNT(*) instead of the DATEDIFF.

Quassnoi
I should have been more specific! How do you do N actions? Specifically 14 (without 14 different subselects preferably)
Russell Steen
Looks good, I'll test it shortly. I definitely need sequence and not just count(*). We're looking for patterns of behavior.
Russell Steen
A: 

I'm going to assume that by a sequence you mean two or more rows with sequential datetime values with no other rows in between for the same user with a different action type. That being the case, this should give you what you're looking to get:

SELECT DISTINCT
     T1.customer
FROM
     Maintenance T1
INNER JOIN Maintenance T2 ON
     T2.customer = T1.customer AND
     T2.action_type = 2 AND
     T2.last_login > T1.last_login
LEFT OUTER JOIN Maintenance T3 ON
     T3.customer = T1.customer AND
     T3.last_login > T1.last_login AND
     T3.last_login < T2.last_login AND
     T3.action_type <> 2
WHERE
     T1.actiontype = 2 AND
     T3.customer IS NULL

The SQL does just what I said above - finds a row (T1) with another row after it (T2) both with action_type = 2 where there is no row in between (T3) with a different action type. The T3.customer IS NULL checks for NULL because if the column is NULL (I'm assuming that it's a NOT NULL column) then it means that the LEFT OUTER JOIN must not have found a row that matches the criteria.

Tom H.
+1  A: 

Use:

WITH dates AS (
  SELECT CAST('2007-01-01' AS DATETIME) 'date'
  UNION ALL
   SELECT DATEADD(dd, 1, t.date) 
     FROM dates t
    WHERE DATEADD(dd, 1, t.date) <= GETDATE())
   SELECT m.customer, 
          m.actiontype
     FROM dates d
LEFT JOIN MAINTENANCE m ON m.last_login = d.date
    WHERE m.last_login IS NULL
OMG Ponies