views:

408

answers:

4

I've seen many different ways to create and populate a numbers table. However, what is the best way to create and populate one? With "best" being defined from most to least important:

  • Table created with optimal indexing
  • rows generated fastest
  • simple code used to create and populate

If you don't know what a numbers table is, look here: Why should I consider using an auxiliary numbers table?

+1  A: 

I use numbers tables for primarily dummying up reports in BIRT without having to fiddle around with dynamic creation of recordsets.

I do the same with dates, having a table spanning from 10 years in the past to 10 years in the future (and hours of the day for more detailed reporting). It's a neat trick to be able to get values for all dates even if your 'real' data tables don't have data for them.

I have a script which I use to create these, something like (this is from memory):

drop table numbers; commit;
create table numbers (n integer primary key); commit;
insert into numbers values (0); commit;
insert into numbers select n+1 from numbers; commit;
insert into numbers select n+2 from numbers; commit;
insert into numbers select n+4 from numbers; commit;
insert into numbers select n+8 from numbers; commit;
insert into numbers select n+16 from numbers; commit;
insert into numbers select n+32 from numbers; commit;
insert into numbers select n+64 from numbers; commit;

The number of rows doubles with each line so it doesn't take a lot to produce truly huge tables.

I'm not sure I agree with you that it's important to be created fast since you only create it once. The cost of that is amortized over all the accesses to it, rendering that time fairly insignificant.

paxdiablo
each _commits;_ results in _Msg 3902, Level 16, State 1, Line 1The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION._
KM
only 128 rows get inserted
KM
@KM, the first point is easily fixed by beginning a transaction (DB/2, my DBMS of choice, is usually configured to autostart transactions). And, if you want more rows, you just have to add more inserts. Each one doubles the range so it's very easy to get up to large numbers if you want. I also prefer to give generic SQL solutions where possible rather than limit solutions to specific vendors.
paxdiablo
+2  A: 

i use this which is fast as hell:

insert into Numbers(N)
select top 1000000 row_number() over(order by t1.number) as N
from   master..spt_values t1 
       cross join master..spt_values t2
Mladen Prajdic
+1  A: 

I start with the following template, which is derived from numerous printings of Itzik Ben-Gan's routine:

;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
 select Number from Tally where Number <= 1000000

The "WHERE N<= 1000000" clause limits the output to 1 to 1 million, and can easily be adjusted to your desired range.

Since this is a WITH clause, it can be worked into an INSERT... SELECT... like so:

--  Sample use: create one million rows
CREATE TABLE dbo.Example (ExampleId  int  not null)  

DECLARE @RowsToCreate int
SET @RowsToCreate = 1000000

--  "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4,294,967,296 rows
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT Example (ExampleId)
 select Number
  from Tally
  where Number <= @RowsToCreate

Indexing the table after it's built will be the fastest way to index it.

Oh, and I'd refer to it as a "Tally" table. I think this is a common term, and you can find loads of tricks and examples by Googling it.

Philip Kelley
+3  A: 

here are some code examples taken from the web and from answers to this question.

For Each Method, I have modified the original code so each use the same table and column: NumbersTest and Number, with 10,000 rows or as close to that as possible. Also, I have provided links to the place of origin.

METHOD 1 here is a very slow looping method from here
avg 13.01 seconds
ran 3 times removed highest, here are times in seconds: 12.42, 13.60

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1)) 
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN 
    INSERT dbo.NumbersTest DEFAULT VALUES 
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 2 here is a much faster looping one from here
avg 1.1658 seconds
ran 11 times removed highest, here are times in seconds: 1.117, 1.140, 1.203, 1.170, 1.173, 1.156, 1.203, 1.153, 1.173, 1.170

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
    INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
    SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 3 Here is a single INSERT based on code from here
avg 488.6 milliseconds
ran 11 times removed highest, here are times in milliseconds: 686, 673, 623, 686,343,343,376,360,343,453

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 4 here is a "semi-looping" method from here avg 348.3 milliseconds (it was hard to get good timing because of the "GO" in the middle of the code, any suggestions would be appreciated)
ran 11 times removed highest, here are times in milliseconds: 356, 360, 283, 346, 360, 376, 326, 373, 330, 373

DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest

METHOD 5 here is a single INSERT from Philip Kelley's answer
avg 92.7 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 96, 96, 93, 110, 110, 80, 76, 93, 93

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH
  Pass0 as (select 1 as C union all select 1), --2 rows
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
  --I removed Pass5, since I'm only populating the Numbers table to 10,000
  Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
        (Number)
    SELECT Number
        FROM Tally
        WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 6 here is a single INSERT from Mladen Prajdic answer
avg 82.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 80, 93, 76, 93, 63, 93, 76, 93, 76

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number  int  not null)  
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1 
    CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

METHOD 7 here is a single INSERT based on the code from here
avg 56.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 63, 50, 63, 46, 60, 63, 63, 46, 63, 46

DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
    INTO NumbersTest
    FROM sys.objects s1
    CROSS JOIN sys.objects s2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest

After looking at all these methods, I really like Method 7, which was the fastest and the code is fairly simple too.

KM