views:

249

answers:

2

There's probably any easy solution for this, but I can't see it. I have a table with consecutive dates and often duplicate associated data for several of these consecutive dates:

Date       Col1  Col2
5/13/2010  1     A
5/14/2010  1     A
5/15/2010  2     B
5/16/2010  1     A
5/17/2010  1     A
5/18/2010  3     C
5/19/2010  3     C
5/20/2010  3     C

Using MS T-SQL, I wish to find the start and end dates for each run of distinct Col1 and Col2 values:

StartDate  EndDate    Col1  Col2
5/13/2010  5/14/2010  1     A
5/15/2010  5/15/2010  2     B
5/16/2010  5/17/2010  1     A
5/18/2010  5/20/2010  3     C

Assumptions: There are never any missing dates. Col1 and Col2 are not null. Any ideas - preferably that don't use cursors? Many thanks, -alan

+1  A: 

Here's one approach using outer apply. Replace @t with your table's name.

SELECT    head.date, last.date, head.col1, head.col2
FROM      @t head
OUTER APPLY (
          SELECT TOP 1 *
          FROM @t t
          WHERE t.date < head.date
          ORDER BY t.date desc
          ) prev
OUTER APPLY (
          SELECT TOP 1 *
          FROM @t t
          WHERE t.date > head.date
          AND (t.col1 <> head.col1 or t.col2 <> head.col2)
          ORDER BY t.date
          ) next
OUTER APPLY (
          SELECT TOP 1 *
          FROM @t t
          WHERE (t.date < next.date or next.date is null)
          AND (t.col1 = head.col1 and t.col2 = head.col2)
          ORDER BY t.date
          ) last
WHERE (prev.col1 is null or head.col1 <> prev.col1 or head.col2 <> prev.col2)

The query first selects the "head" row: rows which start a new group of col1, col2. This is done by looking up the "prev"ious row, and saying it must be different in the where clause.

Then it searches for the end of the col1, col2 group. That's a two step process: first search for the the first row of the "next" group, and the row before that is the "last" row.

Date       Col1  Col2
...
5/15/2010  2     B      <-- "prev" row
5/16/2010  1     A      <-- "head" row
5/17/2010  1     A      <-- "last" row
5/18/2010  3     C      <-- "next" row
...

The result of the query matches the example output in your question.

Andomar
Nice solution. I should've mentioned this is for an old sql server 2000 system.
alan s
Alan, you should mark Andomar as the solution if he fits you.
Irawan Soetomo
A: 

For SQL 2005+ I think the below should work

WITH DATES AS
(
   SELECT COL1, COL2, DATE,
      DATEADD(DAY, -1 * ROW_NUMBER() 
      OVER(PARTITION BY COL1, COL2 ORDER BY DATE), DATE) AS GRP
   FROM YOUR_TABLE
)
SELECT COL1, COL2, MIN(DATE) AS STARTDATE, MAX(DATE) AS ENDDATE
FROM DATES
GROUP BY COL1, COL2, GRP

If you have any duplicate records, use DENSE_RANK() instead of ROW_NUMBER()

For SQL 2000 there is a sub query and a co-related query involved.

SELECT COL1, COL2, MIN(DATE) AS STARTDATE, MAX(DATE) AS ENDDATE
FROM (SELECT COL1, COL2, DATE,
    (SELECT MIN(DATE)
     FROM YOUR_TABLE B
     WHERE B.DATE >= A.DATE AND B.COL1 = A.COL1 AND B.COL2 = A.COL2
           AND NOT EXISTS
           (SELECT *
            FROM YOUR_TABLE C
            WHERE C.COL1 = B.COL1 AND C.COL2 = B.COL2
            AND DATEDIFF(DAY, B.DATE, C.DATE) = 1)
    ) AS GRP
    FROM YOUR_TABLE A
)
GROUP BY COL1, COL2, GRP
Chris Bednarski