views:

111

answers:

3

In SQL Server 2005 I have a table with data that looks something like this:

WTN------------Date  
555-111-1212  2009-01-01  
555-111-1212  2009-01-02  
555-111-1212  2009-01-03  
555-111-1212  2009-01-15  
555-111-1212  2009-01-16  
212-999-5555  2009-01-01  
212-999-5555  2009-01-10  
212-999-5555  2009-01-11

From this I would like to extract WTN, Min(Date), Max(Date) the twist is I would like to also break whenever there is a gap in the dates, so from the above data, my results should look like:

WTN------------ MinDate---- MaxDate  
555-111-1212   2009-01-01  2009-01-03  
555-111-1212   2009-01-15  2009-01-16  
212-999-5555   2009-01-01  2009-01-01  
212-999-5555   2009-01-10  2009-01-11
  1. How can I do this in a SQL Select/ Group By?
  2. Can this be done without a table or list enumerating the values I want to identify gaps in (Dates here)?
+4  A: 

Why is everyone so dead set against using a table for this kind of thing? A table of numbers or a calendar table takes up such little space and is probably in memory if referenced enough anyway. You can also derive a numbers table pretty easily on the fly using ROW_NUMBER(). Using a numbers table can help with the understanding of the query. But here is a not-so-straightforward example, a trick I picked up from Plamen Ratchev a while back, hope it helps.

DECLARE @wtns TABLE
(
    WTN    CHAR(12),
    [Date] SMALLDATETIME
);

INSERT @wtns(WTN, [Date])
          SELECT '555-111-1212','2009-01-01'
UNION ALL SELECT '555-111-1212','2009-01-02'
UNION ALL SELECT '555-111-1212','2009-01-03'
UNION ALL SELECT '555-111-1212','2009-01-15'
UNION ALL SELECT '555-111-1212','2009-01-16'
UNION ALL SELECT '212-999-5555','2009-01-01'
UNION ALL SELECT '212-999-5555','2009-01-10' 
UNION ALL SELECT '212-999-5555','2009-01-11';

WITH x AS
(
    SELECT
     [Date],
     wtn,
     part = DATEDIFF(DAY, 0, [Date]) 
     + DENSE_RANK() OVER
     (
      PARTITION BY wtn
      ORDER BY [Date] DESC
     )
    FROM @wtns
)
SELECT 
    WTN, 
    MinDate = MIN([Date]),
    MaxDate = MAX([Date])
FROM
    x
GROUP BY 
    part,
    WTN
ORDER BY
    WTN DESC,
    MaxDate;
Aaron Bertrand
OMG Ponies
But a numbers table is so useful for so many things that you shouldn't need to define it repeatedly. This is better as permanent table in my view.
HLGEM
Oh no! Defining a table? Populating it? You only define the table and populate it once. Now you can reference that table and not worry about having code for such a CTE in every module where you need a sequence. In theory it will be more efficient than deriving at run time because, as I mentioned before, it will be in memory in most cases, and it should be properly indexed as well. I say in theory because you won't notice much of a performance difference until you hit a certain threshold of numbers / dates.
Aaron Bertrand
Time tables are common in business intelligence applications. Here is how to link from MSDN http://msdn.microsoft.com/en-us/library/ms174832.aspx. Once you use a good one you won't mind having an 'extra' table in your db.
jms
@HLGEM: I don't disagree, but APC compared the numbers table vs recursive CTE - the CTE was faster. But the numbers table is the only option for pre-SQL Server 2005.
OMG Ponies
OMG Ponies
But you create and populate *once*... if you are building your CTE from scratch in every single query you write, I think you are much more likely to make mistakes here and there, than if you reference a table that can be verified at any time by anyone. Anyway a made up CROSS JOIN is much faster than a recursive CTE, at least in my testing: http://is.gd/4rFCp ... but I still think a static numbers and/or calendar table has its place in many environments - maybe not in yours, but that doesn't make a CTE the universal solution.
Aaron Bertrand
I still don't understand why you would populate a temporary table for this. Store it permanently, or generate it on the fly. All a temp table adds is unneeded I/O (though I am sure it is better than some recursive CTE solutions).
Aaron Bertrand
@Aaron: If I've needed a CTE more than once, I turn it into a view. Never said that a CTE was a universal solution...
OMG Ponies
Aaron Bertrand
My bad - it was KM who tested the setup. Here's the SO question: http://stackoverflow.com/questions/1478951/tsql-generate-a-resultset-of-incrementing-dates/1479028#1479028
OMG Ponies
So the recursive CTE *in that specific test* was a fraction faster in operator cost. And still you have to code the CTE every time you want to use it. While I'm all for squeezing every bit of performance out of SQL Server, I need to draw the line when the trade-off means my coding becomes more difficult. I still favor a numbers table (for other reasons other than performance) or a cross join to a recursive CTE.
Aaron Bertrand
A: 

Your problem has to do with INTERVAL TYPES and a thing called PACKED NORMAL FORM of a relation.

The issues are discussed at large in "Temporal Data and the Relational Model".

Don't expect any SQL system to really help you with such problems.

Some tutorial systems notwithstanding, the only DBMS that offers decent support for such problems, and that I know of, is my own. No link because I don't want to be doing too much "plugging" here.

Erwin Smout
A: 

You can do this with the GROUP BY, by detecting the boundaries:

WITH    Boundaries
      AS (
          SELECT    m.WTN
                   ,m.Date
                   ,CASE WHEN p.Date IS NULL THEN 1
                         ELSE 0
                    END AS IsStart
                   ,CASE WHEN n.Date IS NULL THEN 1
                         ELSE 0
                    END AS IsEnd
          FROM      so1590166 AS m
          LEFT JOIN so1590166 AS p
                    ON p.WTN = m.WTN
                       AND p.Date = DATEADD(d, -1, m.Date)
          LEFT JOIN so1590166 AS n
                    ON n.WTN = m.WTN
                       AND n.Date = DATEADD(d, 1, m.Date)
          WHERE     p.Date IS NULL
                    OR n.Date IS NULL
         )
SELECT  l.WTN
       ,l.Date AS MinDate
       ,MIN(r.Date) AS MaxDate
FROM    Boundaries l
INNER JOIN Boundaries r
        ON r.WTN = l.WTN
           AND r.Date >= l.Date
           AND l.IsStart = 1
           AND r.IsEnd = 1
GROUP BY l.WTN
       ,l.Date
Cade Roux