views:

739

answers:

4

I have a table that contains a number and a range value. For instance, one column has the value of 40 and the other column has a value of 100 meaning that starting 40 the range has 100 values ending in 139 inclusive of the number 40. I want to write a tsql statement that expands my data into individual rows.

I think I need a cte for this but do not know how I can achieve this.

Note: when expanded I am expecting 7m rows.

A: 

you could try this approach:

create function [dbo].[fRange](@a int, @b int)
    returns @ret table (val int)
as 
begin
    declare @val int
    declare @end int
    set @val = @a
    set @end = @a + @b
    while @val < @end
    begin
     insert into @ret(val)
     select @val
     set @val = @val+1
    end
return
end

go

declare @ranges table(start int, noOfEntries int)

insert into @ranges (start, noOfEntries)
select 40,100
union all select 150, 10

select * from @ranges r
    cross apply dbo.fRange(start,noOfEntries ) fr

not the fastest but should work

kristof
Should have mentioned resulting output will be around 7m rows. Needs to be fairly quick, I don't think an inline function will cut it but thank you for the idea.
Coolcoder
In that case you probably need a table with pre-populated values of you possible ranges, on use join on that table. About the CTE approach, not sure hoe that would perform. You would most likely use recursion but still would need to use some math operation like + 1 to get the values.
kristof
+1  A: 

I don't know how this could be done with common table expressions, but here is a solution using a temporary table:

   SET NOCOUNT ON

   DECLARE @MaxValue INT
   SELECT @MaxValue = max(StartValue + RangeValue) FROM MyTable

   DECLARE @Numbers table ( 
      Number INT IDENTITY(1,1) PRIMARY KEY 
   )

   INSERT @Numbers DEFAULT VALUES 

   WHILE COALESCE(SCOPE_IDENTITY(), 0) <= @MaxValue 
      INSERT @Numbers DEFAULT VALUES 

    SELECT n.Number
    FROM   @Numbers n
    WHERE  EXISTS(
        SELECT *
        FROM   MyTable t
        WHERE  n.Number BETWEEN t.StartValue AND t.StartValue + t.RangeValue - 1
    )

    SET NOCOUNT OFF

Could be optimized if the Numbers table was a regular table. So you don't have to fill the temporary table on every call.

splattne
Call me Anal, but Temporary Table (#table) are stored on Disk and Table Variables (@table) are stored in Memory. So, a "Temporary Table Variable" is a bit of an oxymoron :) [Also, does a CLUSTERED index make Any difference on a Table Variable?]
Dems
I rewrote that sample from a real table... so that clustered index came from "copy paste" ;-)
splattne
+1, I think this solution with the use of regular table would perform well. CoolCoder indicated generating about 7m rows so inserting that many row each time to table variable would be very time consuming
kristof
@Dems: Thank you for pointing that out. I edited my answer.
splattne
@Dems,What you say is a common misconception. Table variables *DO* create a temporary table behind the scene; both temp tables and table variables will use memory as much as possible. With temp tables you get statistics and the ability to create indexes.
Thuglife
A: 

I would do something slightly different from splattne...

SET NOCOUNT ON

DECLARE @MaxValue INT
DECLARE @Numbers table (
    Number INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
 )

SELECT @MaxValue = max(RangeValue) FROM MyTable
INSERT @Numbers DEFAULT VALUES

WHILE COALESCE(SCOPE_IDENTITY(), 0) <= @MaxValue
    INSERT @Numbers DEFAULT VALUES

SELECT
    t.startValue + n.Number
FROM
    MyTable t
INNER JOIN
    @Numbers n
        ON n.Number < t.RangeValue

SET NOCOUNT OFF

This will minimise the number of rows you need to insert into the table variable, then use a join to 'multiply' one table by the other...

By the nature of the query, the source table table doesn't need indexing, but the "numbers" table should have an index (or primary key). Clustered Indexes refer to how they're stored on the Disk, so I can't see CLUSTERED being relevant here, but I left it in as I just copied from Splattne.

(Large joins like this may be slow, but still much faster than millions of inserts.)

Dems
Nice! But I think the result will omit every first value of each row. In CoolCoder's example the value 100 will be omitted.
splattne
Too slow for the number of records (around 300,000) and maximum range (16777216).
Coolcoder
+3  A: 

If you want CTE here is an example:

Initial insert:

insert into rangeTable (StartValue, RangeValue)
select 40,100
union all select 150,10
go

the query:

with r_CTE (startVal, rangeVal, generatedVal)
as
(
    select r.startValue, r.rangeValue, r.startValue
    from rangeTable r
    union all
    select r.startValue, r.rangeValue, generatedVal+1
    from rangeTable r
    inner join r_CTE rc 
     on r.startValue = rc.startVal
     and r.rangeValue = rc.rangeVal
     and r.startValue +  r.rangeValue > rc.generatedVal + 1
)
select * from r_CTE
order by startVal, rangeVal, generatedVal

Just be aware that the default maximum number of recursions is 100. You can change it to the maximum of 32767 by calling

option (maxrecursion 32767)

or to no limit

option (maxrecursion 0)

See BOL for details

kristof
+1 for answering the question. But I guess CoolCoder won't use it. My brain hurts while I'm trying to understand the query!
splattne
@splattne - I suppose the approach that you suggested but with the pre-populated standard table would work best for large number of records. And about the cte; my brain hurts as well I have not used recursion for ages so I thought I should give it a go :)
kristof
The range values can be greater than 32767 so a CTE is out of the question
Coolcoder
But your answer works for the question I asked...
Coolcoder
I'll have to try that to see if it's faster than a TSQL loop. Bit apprehensive about the cost of recurssing so deep...
Dems