views:

175

answers:

4

I have the following problem, that I would like to solve with transact-sql. I have something like this

Start | End | Item
1 | 5 | A
3 | 8 | B

and I want to create something like

Start |End | Item-Combination
1 | 2 | A
3 | 5 | A-B
6 | 8 | B

For the Item-Combination concatenation I already thought of using the FOR XML statement. But in order to create the different new intervals... I really don't know how to approach it. Any idea?

Thanks.

+1  A: 

I had a very similar problem with some computer usage data. I had session data indicating login/logout times. I wanted to find the times (hour of day per day of week) that were the most in demand, that is, the hours where the most users were logged in. I ended up solving the problem client-side using hash tables. For each session, I would increment the bucket for a particular location corresponding to the day of week and hour of day for each day/hour for which the session was active. After examining all sessions the hash table values show the number of logins during each hour for each day of the week.

I think you could do something similar, keeping track of each item seen for each start/end value. You could then reconstruct the table by collapsing adjacent entries that have the same item combination.

And, no, I could not think of a way to solve my problem with SQL either.

tvanfosson
+1  A: 

This is a fairly typical range-finding problem, with the concatenation thrown in. Not sure if the following fits exactly, but it's a starting point. (Cursors are usually best avoided except in the small set of cases where they are faster than set-based solutions, so before the cursor haters get on me please note I use a cursor here on purpose because this smells to me like a cursor-friendly problem -- I typically avoid them.)

So if I create data like this:

CREATE TABLE [dbo].[sourceValues](
    [Start] [int] NOT NULL,
    [End] [int] NOT NULL,
    [Item] [varchar](100) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[sourceValues]  WITH CHECK ADD  CONSTRAINT [End_after_Start] CHECK  (([End]>[Start]))
GO

ALTER TABLE [dbo].[sourceValues] CHECK CONSTRAINT [End_after_Start]
GO

declare @i int; set @i = 0;
declare @start int;
declare @end int;
declare @item varchar(100);
while @i < 1000
begin
    set @start =  ABS( CHECKSUM( newid () ) % 100 ) + 1 ; -- "random" int
    set @end = @start + ( ABS( CHECKSUM( newid () ) % 10 ) ) + 2;  -- bigger random int
    set @item = char( ( ABS( CHECKSUM( newid() ) ) % 5 ) + 65 ); -- random letter A-E
    print @start; print @end; print @item;
    insert into sourceValues( Start, [End], Item) values ( @start , @end, @item );
    set @i += 1;
end

Then I can treat the problem like this: each "Start" AND each "End" value represents a change in the collection of current Items, either adding one or removing one, at a certain time. In the code below I alias that notion as "event," meaning an Add or Remove. Each start or end is like a time, so I use the term "tick." If I make a collection of all the events, ordered by event time (Start AND End), I can iterate through it while keeping a running tally in an in-memory table of all the Items that are in play. Each time the tick value changes, I take a snapshot of that tally:

declare @tick int;
declare @lastTick int;
declare @event varchar(100);
declare @item varchar(100);
declare @concatList varchar(max);
declare @currentItemsList table ( Item varchar(100) );

create table #result ( Start int, [End] int, Items varchar(max) );

declare eventsCursor CURSOR FAST_FORWARD for 
    select tick, [event], item from (
     select start as tick, 'Add' as [event], item from sourceValues as adds
     union all 
     select [end] as tick, 'Remove' as [event], item from sourceValues as removes
    ) as [events] 
    order by tick

set @lastTick = 1
open eventsCursor
fetch next from eventsCursor into @tick, @event, @item  
while @@FETCH_STATUS = 0
BEGIN
    if @tick != @lastTick 
    begin
     set @concatList = ''
     select @concatList = @concatlist + case when len( @concatlist ) > 0 then '-' else '' end + Item 
     from @currentItemsList
     insert into #result ( Start, [End], Items ) values ( @lastTick, @tick, @concatList )
    end

    if @event = 'Add' insert into @currentItemsList ( Item ) values ( @item );
    else if @event = 'Remove' delete top ( 1 ) from @currentItemsList where Item = @item;

    set @lastTick = @tick;
    fetch next from eventsCursor into @tick, @event, @item;
END

close eventsCursor
deallocate eventsCursor

select * from #result order by start
drop table #result

Using a cursor for this special case allows just one "pass" through the data, like a running totals problem. Itzik Ben-Gan has some great examples of this in his SQL 2005 books.

onupdatecascade
A: 

This will exactly emulates and solves the mentioned problem:


-- prepare problem, it can have many rows with overlapping ranges
declare @range table
(
    Item char(1) primary key,
    [Start] int,
    [End] int
)
insert @range select 'A', 1, 5
insert @range select 'B', 3, 8

-- unroll the ranges into helper table
declare @usage table
(
    Item char(1),
    Number int
)

declare
    @Start int,
    @End int,
    @Item char(1)

declare table_cur cursor local forward_only read_only for
    select [Start], [End], Item from @range
open table_cur
fetch next from table_cur into @Start, @End, @Item
while @@fetch_status = 0
begin
    with
    Num(Pos) as -- generate numbers used
    (
     select cast(@Start as int)
     union all 
     select cast(Pos + 1 as int) from Num where Pos < @End
    )
    insert
     @usage
    select
     @Item,
     Pos
    from
     Num
    option (maxrecursion 0) -- just in case more than 100

    fetch next from table_cur into @Start, @End, @Item
end
close table_cur
deallocate table_cur

-- compile overlaps
;
with
overlaps as
(
    select 
     Number, 
     (
      select
       Item + '-' 
      from 
       @usage as i
      where 
       o.Number = i.Number
      for xml path('')
     )
     as Items
    from 
     @usage as o
    group by 
     Number
)
select
    min(Number) as [Start],
    max(Number) as [End],
    left(Items, len(Items) - 1) as Items -- beautify
from
    overlaps
group by
    Items
Irawan Soetomo
A: 

Thanks a lot for all the answers, for the moment I have found a way of doing it. SInce I'm dealing with a datawarehouse, and I have a Time dimension, I could do some joins with Time dimension in the style"inner join DimTime t on t.date between f.start_date and end_date".

It's not very good from the performance point of view, but it seems it's working for me.

I'll give a try to onupdatecascade implementation, to see which suits better for me.

river0
Ehm, a little notch up would be most appreciated, thanks! :)
Irawan Soetomo