tags:

views:

184

answers:

4

This one is nasty complicated to solve.

I have a table containing date ranges, each date range has a priority. Highest priority means this date range is the most important.

Or in SQL

create table #ranges (Start int, Finish int,  Priority int) 


insert #ranges values (1 , 10, 0)
insert #ranges values (2 , 5 , 1)
insert #ranges values (3 , 4 , 2)
insert #ranges values (1 , 5 , 0)
insert #ranges values (200028, 308731, 0)


Start       Finish      Priority
----------- ----------- -----------
1           10          0
2           5           1
3           4           2
1           5           0
200028      308731      0

I would like to run a series of SQL queries on this table that will result in the table having no overlapping ranges, it is to take the highest priority ranges over the lower ones. Split off ranges as required, and get rid of duplicate ranges. It allows for gaps.

So the result should be:

Start       Finish      Priority    
----------- ----------- -----------
1           2           0
2           3           1
3           4           2
4           5           1
5          10           0
200028     308731       0

Anyone care to give a shot at the SQL? I would also like it to be as efficient as possible.

A: 

I'm a little confused about what you want to end up with. Is this the same as simply having a set of dates where one range continues until the next one starts (in which case you don't really need the Finish date, do you?)

Or can a range Finish and there's a gap until the next one starts sometimes?

If the range Start and Finish are explicitly set, then I'd be inclined to leave both, but have the logic to apply the higher priority during the overlap. I'd suspect that if dates start getting adjusted, you'll eventually need to roll back a range that got shaved, and the original setting will be gone.

And you'll never be able to explain "how it got that way".

Do you want simply a table with a row for each date, including its priority value? Then when you have a new rule, you can bump the dates that would be trumped by the new rule?

I did a medical office scheduling app once that started with work/vacation/etc. requests with range-type data (plus a default work-week template.) Once I figured out to store the active schedule info as user/date/timerange records, things fell into place a lot more easily. YMMV.

le dorfier
dorfer see my clarified question. just trumping rules is not good enough I need to split stuff. also i need this for modeling reports, so I am not really in control of what gets inserted
Sam Saffron
A: 

Here is something to get you started. It is helpful if you use a calendar table:

CREATE TABLE dbo.Calendar  
(  
    dt SMALLDATETIME NOT NULL 
        PRIMARY KEY CLUSTERED
) 
GO

SET NOCOUNT ON 
DECLARE @dt SMALLDATETIME 
SET @dt = '20000101' 
WHILE @dt < '20200101' 
BEGIN 
    INSERT dbo.Calendar(dt) SELECT @dt 
    SET @dt = @dt + 1 
END
GO

Code to setup the problem:

create table #ranges (Start DateTime NOT NULL, Finish DateTime NOT NULL,  Priority int NOT NULL) 
create table #processed (dt DateTime NOT NULL, Priority int NOT NULL) 

ALTER TABLE #ranges ADD PRIMARY KEY (Start,Finish, Priority)
ALTER TABLE #processed ADD PRIMARY KEY (dt)


declare @day0 datetime,
    @day1 datetime, 
    @day2 datetime, 
    @day3 datetime, 
    @day4 datetime, 
    @day5 datetime

select @day0 = '2000-01-01', 
    @day1 = @day0 + 1, 
    @day2 = @day1 + 1,  
    @day3 = @day2 + 1,
    @day4 = @day3 + 1,
    @day5 = @day4 + 1

insert #ranges values (@day0, @day5, 0)
insert #ranges values (@day1, @day4, 1)
insert #ranges values (@day2, @day3, 2)
insert #ranges values (@day1, @day4, 0)

Actual solution:

DECLARE @start datetime, @finish datetime, @priority int

WHILE 1=1 BEGIN
    SELECT TOP 1 @start = start, @finish = finish, @priority = priority
    FROM #ranges 
    ORDER BY priority DESC, start, finish

    IF @@ROWCOUNT = 0
     BREAK

    INSERT INTO #processed (dt, priority)
     SELECT dt, @priority FROM calendar 
     WHERE dt BETWEEN @start and @finish
     AND NOT EXISTS (SELECT * FROM #processed WHERE dt = calendar.dt)

    DELETE FROM #ranges WHERE @start=start AND @finish=finish AND @priority=priority
END

Results: SELECT * FROM #processed

dt                      Priority
----------------------- -----------
2000-01-01 00:00:00.000 0
2000-01-02 00:00:00.000 1
2000-01-03 00:00:00.000 2
2000-01-04 00:00:00.000 2
2000-01-05 00:00:00.000 1
2000-01-06 00:00:00.000 0

The solution is not in the exact same format, but the idea is there.

beach
beach, see my clarified question
Sam Saffron
+1  A: 

This is most of the way there, possible improvement would be joining up adjacent ranges of the same priority. It's full of cool trickery.

select Start, cast(null as int) as Finish, cast(null as int) as Priority 
into #processed  
from #ranges
union  
select Finish, NULL, NULL 
from #ranges


update p 
set Finish = (
    select min(p1.Start) 
    from #processed p1 
    where p1.Start > p.Start
)
from #processed p 

create clustered index idxStart on #processed(Start, Finish, Priority) 
create index idxFinish on #processed(Finish, Start, Priority) 

update p
set Priority = 
    (
     select max(r.Priority) 
     from #ranges r
     where 
     (
      (r.Start <= p.Start and r.Finish > p.Start) or 
      (r.Start >= p.Start and r.Start < p.Finish)  
     )
    )
from #processed p

delete from #processed
where Priority is null 

select * from #processed
Sam Saffron
Impressive solution!
Jonas Lincoln
A: 

This can be done in 1 SQL (i first made the query in Oracle using lag and lead, but since MSSQL doesn't support those functions i rewrote the query using row_number. I'm not sure if the result is MSSQL compliant, but it should be very close):

with x as (
select rdate   rdate
,      row_number() over (order by rdate)   rn 
from   (
       select start    rdate
       from   ranges
       union
       select finish   rdate
       from   ranges
       )
)
select d.begin
,      d.end
,      max(r.priority)
from   ( 
       select begin.rdate    begin
       ,      end.rdate      end
       from   x      begin
       ,      x      end
       where  begin.rn = end.rn - 1
       )          d
,      ranges     r
where  r.start    <= d.begin
and    r.finish   >= d.end
and    d.begin    <> d.end
group by d.begin
,      d.end
order by 1, 2

I first made a table (x) with all dates. Then I turned this into buckets by joining x with itself and taking 2 following rows. After this I linked all the possible priorities with the result. By taking the max(priority) I get the requested result.