views:

122

answers:

1

I have two tables both of which have columns StartDate and EndDate.

I'm trying to return a single resultset that contains all date ranges from one table (TableA), and all complement date ranges from the other one (TableB).

CREATE TABLE [dbo].[TableA](
    [ID] [int] NOT NULL,
    [StartDate] [datetime] NOT NULL,
    [EndDate] [datetime] NOT NULL
)

CREATE TABLE [dbo].[TableB](
    [ID] [int] NOT NULL,
    [StartDate] [datetime] NOT NULL,
    [EndDate] [datetime] NOT NULL
)

INSERT INTO TableA (ID, StartDate, EndDate) VALUES(1, '4/1/2009', '8/1/2009')
INSERT INTO TableA (ID, StartDate, EndDate) VALUES(1, '10/1/2009', '12/1/2009')
INSERT INTO TableB (ID, StartDate, EndDate) VALUES(1, '1/1/2009', '2/1/2010')

INSERT INTO TableA (ID, StartDate, EndDate) VALUES(2, '4/1/2009', '8/1/2009')
INSERT INTO TableB (ID, StartDate, EndDate) VALUES(2, '1/1/2009', '5/1/2009')
INSERT INTO TableB (ID, StartDate, EndDate) VALUES(2, '7/1/2009', '12/1/2009')

The expected resultset from the three datasets should be:

(ID = 1)
1/1/2009 - 4/1/2009 (from TableB)
4/1/2009 - 8/1/2009 (from TableA)
8/1/2009 - 10/1/2009 (from TableB)
10/1/2009 - 12/1/2009 (from TableA)
12/1/2009 - 2/1/2010 (from TableB)

(ID = 2)
1/1/2009 - 4/1/2009 (from TableB)
4/1/2009 - 8/1/2009 (from TableA)
8/1/2009 - 12/1/2009 (from TableB)

The date ranges are not guaranteed to be continuous, and I can't make any assumptions on how they're overlapping between tables...within each table they can be assumed to not overlap.

I'm having problems wrapping my head around how to split the single date ranges in TableB into multiple pieces to find all the complement "regions" within it in SQL.

Anyone have any suggestions?

+1  A: 

If you create this as a view, I think it does what you want. It uses CTEs, which should be supported by SQL Server 2005, but not earlier.

WITH Timestamps AS (
    SELECT Id, StartDate AS Date FROM TableA
    UNION
    SELECT Id, EndDate AS Date FROM TableA
    UNION
    SELECT Id, StartDate AS Date FROM TableB
    UNION
    SELECT Id, EndDate AS Date FROM TableB
), Timestamps2 AS (
    SELECT ROW_NUMBER() OVER (ORDER BY Id, Date) AS RowNumber, * FROM Timestamps
), Timestamps3 AS (
    SELECT T1.ID, T1.Date AS StartDate, T2.Date AS EndDate
    FROM Timestamps2 AS T1 JOIN Timestamps2 AS T2
    ON T1.RowNumber + 1 = T2.RowNumber AND T1.ID = T2.ID
), IntervalsFromB AS (
    SELECT T.ID, T.StartDate, T.EndDate FROM Timestamps3 AS T
    LEFT JOIN TableA AS A
    ON T.StartDate >= A.StartDate AND T.EndDate <= A.EndDate
    WHERE A.StartDate IS NULL)
SELECT * FROM TableA
UNION ALL
SELECT * FROM IntervalsFromB

Full output (ordered by Id, StartDate for readability):

Id  StartDate               EndDate
1   2009-01-01 00:00:00.000 2009-04-01 00:00:00.000
1   2009-04-01 00:00:00.000 2009-08-01 00:00:00.000
1   2009-08-01 00:00:00.000 2009-10-01 00:00:00.000
1   2009-10-01 00:00:00.000 2009-12-01 00:00:00.000
1   2009-12-01 00:00:00.000 2010-02-01 00:00:00.000
2   2009-01-01 00:00:00.000 2009-04-01 00:00:00.000
2   2009-04-01 00:00:00.000 2009-08-01 00:00:00.000
2   2009-08-01 00:00:00.000 2009-12-01 00:00:00.000

It was pretty complicated for me to implement this, so I'm wondering if anyone can see a simpler way. I might be missing some trick that makes this much simpler. If so, please let me know! Also, you will almost certainly need some indexes on your tables to get this to perform well if you have a lot of rows. Some other optimizations may be possible - I haven't tried for the fastest possible performance, but just to get the correct result.

Mark Byers
You can replace the final `UNION` with a `FULL JOIN`, otherwise the query is correct. See here: http://explainextended.com/2009/11/09/inverting-date-ranges/
Quassnoi
Nice link - it pretty much explains the query I just wrote. I would never have found that by Googling.
Mark Byers
PS, I think my final UNION ALL is correct - that's just the part where I combine the results from TableA and (TableB-TableA). I think the part you are referring to with the FULL JOIN in my query is inside Timestamps3 (yeah, bad names, I'm sorry) where instead I do an 'INNER JOIN'. This kills the two rows with NULLs but I think that's what he wants anyway, so I don't think any change is required.
Mark Byers
You could probably also change some of the 'UNIONs' to 'UNION ALLs' in the first CTE to get better performance. I think it's a minor issue so I ignored this.
Mark Byers
Good stuff. Looks to be working very nicely with the example dataset. I'll give it a go on my real dataset in a sec. Thanks a lot Mark, never would've figured this one on my own.
Gorbachev