views:

1083

answers:

2

I have a SQL Server 2005 database which contains a table called Memberships.

The table schema is:

PersonID int, Surname nvarchar(30), FirstName nvarchar(30), Description nvarchar(100), StartDate datetime, EndDate datetime

I'm currently working on a grid feature which shows a break-down of memberships by person. One of the requirements is to split membership rows where there is an intersection of date ranges. The intersection must be bound by the Surname and FirstName, ie splits only occur with membership records of the same Surname and FirstName.

Example table data:

18  Smith  John  Poker Club  01/01/2009  NULL
18  Smith  John  Library     05/01/2009  18/01/2009
18  Smith  John  Gym         10/01/2009  28/01/2009
26  Adams  Jane  Pilates     03/01/2009  16/02/2009

Expected result set:

18  Smith  John  Poker Club                  01/01/2009  04/01/2009
18  Smith  John  Poker Club / Library        05/01/2009  09/01/2009
18  Smith  John  Poker Club / Library / Gym  10/01/2009  18/01/2009
18  Smith  John  Poker Club / Gym            19/01/2009  28/01/2009
18  Smith  John  Poker Club                  29/01/2009  NULL
26  Adams  Jane  Pilates                     03/01/2009  16/02/2009

Does anyone have any idea how I could write a stored procedure that will return a result set which has the break-down described above.

+2  A: 

The problem you are going to have with this problem is that as the data set grows, the solutions to solve it with TSQL won't scale well. The below uses a series of temporary tables built on the fly to solve the problem. It splits each date range entry into its respective days using a numbers table. This is where it won't scale, primarily due to your open ranged NULL values which appear to be inifinity, so you have to swap in a fixed date far into the future that limits the range of conversion to a feasible length of time. You could likely see better performance by building a table of days or a calendar table with appropriate indexing for optimized rendering of each day.

Once the ranges are split, the descriptions are merged using XML PATH so that each day in the range series has all of the descriptions listed for it. Row Numbering by PersonID and Date allows for the first and last row of each range to be found using two NOT EXISTS checks to find instances where a previous row doesn't exist for a matching PersonID and Description set, or where the next row doesn't exist for a matching PersonID and Description set.

This result set is then renumbered using ROW_NUMBER so that they can be paired up to build the final results.

/*
SET DATEFORMAT dmy
USE tempdb;
GO
CREATE TABLE Schedule
( PersonID int, 
 Surname nvarchar(30), 
 FirstName nvarchar(30), 
 Description nvarchar(100), 
 StartDate datetime, 
 EndDate datetime)
GO
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Library', '05/01/2009', '18/01/2009')
INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/01/2009', '28/01/2009')
INSERT INTO Schedule VALUES (26, 'Adams', 'Jane', 'Pilates', '03/01/2009', '16/02/2009')
GO

*/

SELECT 
 PersonID, 
 Description, 
 theDate
INTO #SplitRanges
FROM Schedule, (SELECT DATEADD(dd, number, '01/01/2008') AS theDate
    FROM master..spt_values
    WHERE type = N'P') AS DayTab
WHERE theDate >= StartDate 
  AND theDate <= isnull(EndDate, '31/12/2012')

SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS rowid,
 PersonID, 
 theDate, 
 STUFF((
  SELECT '/' + Description
  FROM #SplitRanges AS s
  WHERE s.PersonID = sr.PersonID 
    AND s.theDate = sr.theDate
  FOR XML PATH('')
  ), 1, 1,'') AS Descriptions
INTO #MergedDescriptions
FROM #SplitRanges AS sr
GROUP BY PersonID, theDate


SELECT 
 ROW_NUMBER() OVER (ORDER BY PersonID, theDate) AS ID, 
 *
INTO #InterimResults
FROM
(
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID - 1 = t2.RowID 
     AND t1.Descriptions = t2.Descriptions)
UNION ALL
 SELECT * 
 FROM #MergedDescriptions AS t1
 WHERE NOT EXISTS 
  (SELECT 1 
   FROM #MergedDescriptions AS t2 
   WHERE t1.PersonID = t2.PersonID 
     AND t1.RowID = t2.RowID - 1
     AND t1.Descriptions = t2.Descriptions)
) AS t

SELECT DISTINCT 
 PersonID, 
 Surname, 
 FirstName
INTO #DistinctPerson
FROM Schedule

SELECT 
 t1.PersonID, 
 dp.Surname, 
 dp.FirstName, 
 t1.Descriptions, 
 t1.theDate AS StartDate, 
 CASE 
  WHEN t2.theDate = '31/12/2012' THEN NULL 
  ELSE t2.theDate 
 END AS EndDate
FROM #DistinctPerson AS dp
JOIN #InterimResults AS t1 
 ON t1.PersonID = dp.PersonID
JOIN #InterimResults AS t2 
 ON t2.PersonID = t1.PersonID 
  AND t1.ID + 1 = t2.ID 
  AND t1.Descriptions = t2.Descriptions

DROP TABLE #SplitRanges
DROP TABLE #MergedDescriptions
DROP TABLE #DistinctPerson
DROP TABLE #InterimResults

/*

DROP TABLE Schedule

*/

The above solution will also handle gaps between additional Descriptions as well, so if you were to add another Description for PersonID 18 leaving a gap:

INSERT INTO Schedule VALUES (18, 'Smith', 'John', 'Gym', '10/02/2009', '28/02/2009')

It will fill the gap appropriately. As pointed out in the comments, you shouldn't have name information in this table, it should be normalized out to a Persons Table that can be JOIN'd to in the final result. I simulated this other table by using a SELECT DISTINCT to build a temp table to create that JOIN.

Jonathan Kehayias
A: 

Try this

SET DATEFORMAT dmy
DECLARE @Membership TABLE( 
    PersonID int, 
    Surname  nvarchar(16), 
    FirstName nvarchar(16), 
    Description nvarchar(16), 
    StartDate datetime, 
    EndDate  datetime) 
INSERT INTO @Membership VALUES (18, 'Smith', 'John', 'Poker Club', '01/01/2009', NULL)
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Library', '05/01/2009', '18/01/2009')
INSERT INTO @Membership VALUES (18, 'Smith', 'John','Gym', '10/01/2009', '28/01/2009')
INSERT INTO @Membership VALUES (26, 'Adams', 'Jane','Pilates', '03/01/2009', '16/02/2009')

--Program Starts
declare @enddate datetime
--Measuring extreme condition when all the enddates are null(i.e. all the memberships for all members are in progress)
-- in such a case taking any arbitary date e.g. '31/12/2009' here else add 1 more day to the highest enddate
select @enddate =  case when max(enddate) is null then '31/12/2009' else max(enddate) + 1 end from @Membership

--Fill the null enddates
; with fillNullEndDates_cte as
(
    select
      row_number() over(partition by PersonId order by PersonId) RowNum
      ,PersonId
      ,Surname
      ,FirstName
      ,Description
      ,StartDate
      ,isnull(EndDate,@enddate) EndDate
    from @Membership
)
--Generate a date calender
, generateCalender_cte as
(
    select 
     1 as CalenderRows
     ,min(startdate) DateValue
    from @Membership
       union all
        select 
      CalenderRows+1
      ,DateValue + 1
        from    generateCalender_cte   
        where   DateValue + 1 <= @enddate
)
--Generate Missing Dates based on Membership
,datesBasedOnMemberships_cte as
 (
    select 
      t.RowNum
      ,t.PersonId
      ,t.Surname
      ,t.FirstName
      ,t.Description   
      , d.DateValue
      ,d.CalenderRows
    from generateCalender_cte d 
    join fillNullEndDates_cte t ON d.DateValue between t.startdate and t.enddate
)
--Generate Dscription Based On Membership Dates
, descriptionBasedOnMembershipDates_cte as
(
    select    
     PersonID
     ,Surname
     ,FirstName
     ,stuff((
      select '/' + Description
      from datesBasedOnMemberships_cte d1
      where d1.PersonID = d2.PersonID 
      and d1.DateValue = d2.DateValue
      for xml path('')
     ), 1, 1,'') as Description
     , DateValue
     ,CalenderRows
    from datesBasedOnMemberships_cte d2
    group by PersonID, Surname,FirstName,DateValue,CalenderRows
)
--Grouping based on membership dates
,groupByMembershipDates_cte as
(
    select d.*,
    CalenderRows - row_number() over(partition by Description order by PersonID, DateValue) AS  [Group]
    from descriptionBasedOnMembershipDates_cte d
)
select PersonId
,Surname
,FirstName
,Description
,convert(varchar(10), convert(datetime, min(DateValue)), 103) as StartDate
,case when max(DateValue)= @enddate then null else convert(varchar(10), convert(datetime, max(DateValue)), 103) end as EndDate
from groupByMembershipDates_cte 
group by [Group],PersonId,Surname,FirstName,Description
order by PersonId,StartDate
option(maxrecursion 0)
priyanka.sarkar