views:

126

answers:

2

This is an SQL Question, I think it is difficult one - I'm not sure it is possible to achieve in a simple SQL sentence or a stored procedure:

I want to find the number of the longest sequence of the same (known) number in a column in a table:

example:

TABLE: 
DATE    SALEDITEMS
1/1/09       4
1/2/09       3
1/3/09       3
1/4/09       4
1/5/09       3

calling the sp/sentence for 4 will give 1 calling the sp/sentecne for 3 will give 2 as there was 2 times in a row number 3.

I'm running SQL server 2008.

A: 

Untested, because you did not provide DDL and sample data:

DECLARE @SALEDITEMS INT;
SET @SALEDITEMS=3;
SELECT MAX(cnt) FROM(
SELECT COUNT(*) FROM YourTable JOIN (
SELECT y1.[Date] AS d1, y2.[Date] AS d2
FROM YourTable AS y1 JOIN YourTable AS y2 
ON y1.SALEDITEMS=@SALEDITEMS AND y2.SALEDITEMS=@SALEDITEMS
AND NOT EXISTS(SELECT 1 FROM YourTable AS y 
WHERE y.SALEDITEMS<>@SALEDITEMS
AND y1.[Date] < y.[Date] AND y.[Date] < y2.[Date])
) AS t
WHERE [Date] BETWEEN t.d1 AND t.d2
) AS t;
AlexKuznetsov
I get the idea... it might work, there is an error near the last WHERE not sure what yet
Dani
A: 

UPDATE: I generated a million rows of random data, and abandoned the recursive CTE solution, as its query plan didn't make good use of indexes in the optimizer.

But the non-recursive solution I originaly posted turned out to work great, as long as there was an additional non-clustered index on (SALEDITEMS, [DATE]). This makes sense, since the query needs to filter in both directions (both by date and by SALEDITEMS). With this additional index, queries on a million rows return in under 2 seconds on my (not very beefy) desktop mathine. Without this index, the query was dog-slow.

BTW, this is a great example of how SQL Server's cost-based query optimization totally breaks down in some cases. The recursive CTE solution has a cost (on my PC) of 42 and takes at least several minutes to finish. The non-recursive solution has a cost of 15,446 (!!!) and completes in 1.5 seconds. Moral of the story: when comparing SQL Server query plans, don't assume that cost necessarily correlates to query performance!

Anyway, here's the solution I'd recommend (the same non-recursive CTE I posted earlier) :

DECLARE @SALEDITEMS INT = 3;

WITH SalesNoMatch ([DATE], SALEDITEMS, NoMatchDate) 
AS 
(
 SELECT [DATE], SALEDITEMS, 
  (SELECT MIN([DATE]) FROM Sales s2 WHERE s2.SALEDITEMS <> @SALEDITEMS 
   AND s2.[DATE] > s1.[DATE]) as NoMatchDate
 FROM Sales s1
)
, SalesMatchCount ([DATE], ConsecutiveCount) AS
(
 SELECT [DATE], 1+(SELECT COUNT(1) FROM Sales s2 WHERE s2.[DATE] > s1.[DATE] AND s2.[DATE] < NoMatchDate)
 FROM SalesNoMatch s1
 WHERE s1.SALEDITEMS = @SALEDITEMS 
)
SELECT MAX(ConsecutiveCount) 
FROM SalesMatchCount;

Here's the DDL I used to test this, including indexes you'll need:

CREATE TABLE [Sales](
    [DATE] date NOT NULL,
    [SALEDITEMS] int NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX IX_Sales ON Sales ([DATE]);
CREATE UNIQUE NONCLUSTERED INDEX IX_Sales2 ON Sales (SALEDITEMS, [DATE]);

And here's how I created my test data-- 1,000,001 rows with ascending dates with SALEDITEMS randomly set between 1 and 10.

INSERT INTO Sales ([DATE], SALEDITEMS)
VALUES ('1/1/09', 5)

DECLARE @i int = 0;

WHILE (@i < 1000000)
BEGIN
    INSERT INTO Sales ([DATE], SALEDITEMS)
    SELECT DATEADD (d, 1, (SELECT MAX ([DATE]) FROM Sales)), ABS(CHECKSUM(NEWID())) % 10 + 1

    SET @i = @i + 1;
END

Here's the recursive-CTE solution that I abandoned: DECLARE @SALEDITEMS INT = 3;

-- recursive CTE solution (remember to set MAXRECURSION!)
WITH SalesRowNum ([DATE], SALEDITEMS, RowNum) 
AS 
(
 SELECT [DATE], SALEDITEMS, ROW_NUMBER() OVER (ORDER BY s1.[DATE]) as RowNum
 FROM Sales s1
)
, SalesCTE (RowNum, [DATE], ConsecutiveCount) 
AS 
( 
 SELECT s1.RowNum, s1.[DATE], 1 AS ConsecutiveCount
 FROM SalesRowNum s1 
 WHERE SALEDITEMS = @SALEDITEMS

 UNION ALL 

 SELECT s1.RowNum, s1.[DATE], ConsecutiveCount + 1 AS ConsecutiveCount
 FROM SalesRowNum s1 
 INNER JOIN SalesCTE s2 ON s1.RowNum = s2.RowNum + 1
 WHERE SALEDITEMS = @SALEDITEMS
) 
SELECT MAX(ConsecutiveCount) 
FROM SalesCTE;
Justin Grant
Thanks ! tried the 2nd code (the non recursive) on a small table (1400 rows) it worked. tried it on a 30000 rows table, and it never ends, yet there is no index on the date column so I guess I need to start fixing this first. Can I set MaxRecursion in the statement ? how should I unset it or return it to default value ? will the Recursion work faster ?
Dani
I updated the solution with more info-- the non-recursive solution looks like the better one once the table has many rows. And you'll actually need two indexes-- otherwise (as you discovered) when there's a lot of rows, the query will be really slow. Add the second index (see above) and you'll be able to complete your 30,000-row query in well under a second.
Justin Grant