ansaurus

Question

How can I create a series of months to join sparse data to?

Answer 1

+1 A:

how about making a new table called Months: then populating it with data you can join to?

Randy 2010-06-25 17:51:45

Thanks for the suggestion Randy. I have done that. I've created many "complete series" tables and joined on them. I was hoping there was a better way. Also, I was hoping someone could tell me what this concept is called.

Kuyenda 2010-06-25 17:59:22

Answer 2

+2 A:

Queries like this are one of the main reasons that many experienced DBAs or database programmers keep a calendar table in their databases.

Toby 2010-06-25 17:57:34

+1 for the link to calendar table.

Gilbert Le Blanc 2010-06-25 18:25:19

I have also kept a table of a through z and 0 through 999999. That was for PHP. Thanks!

Kuyenda 2010-06-25 18:44:55

A calendar/date/numbers table is the only means of doing this on SQL Server 2000 or older. For SQL Server 2005+, a recursive CTE is more efficient.

OMG Ponies 2010-06-25 18:49:28

The recursive CTE is nice, but the 100 loop cap is a bummer.

Kuyenda 2010-06-28 21:47:13

Answer 3

+2 A:

try something like this:

DECLARE @StartDate datetime
       ,@EndDate datetime
SELECT @StartDate=DATEADD(month,-6,DATEADD(month,DATEDIFF(month,0,GETDATE()),0) )
      ,@EndDate=GETDATE()

;with AllDates AS
(
    SELECT @StartDate AS DateOf
    UNION ALL
    SELECT DateAdd(month,1,DateOf)
        FROM AllDates
    WHERE DateOf<@EndDate
)
SELECT * FROM AllDates

output:

DateOf
-----------------------
2009-12-01 00:00:00.000
2010-01-01 00:00:00.000
2010-02-01 00:00:00.000
2010-03-01 00:00:00.000
2010-04-01 00:00:00.000
2010-05-01 00:00:00.000
2010-06-01 00:00:00.000
2010-07-01 00:00:00.000

(8 row(s) affected)

KM 2010-06-25 18:08:50

Answer 4

+2 A:

I'm with KM, that for SQL Server 2005+ you could use a recursive CTE:

WITH months AS (
  SELECT DATENAME(mm, '2010-01-01') AS month_name, 
         MONTH('2010-01-01') AS month_number, 
         CAST('2010-01-01' AS DATETIME) AS dt
  UNION ALL
  SELECT DATENAME(mm, DATEADD(mm, 1, m.dt)),
         MONTH(DATEADD(mm, 1, m.dt)),
         DATEADD(mm, 1, m.dt)
    FROM months m
   WHERE DATEADD(mm, 1, m.dt) <= '2010-12-01')
   SELECT x.month_name,
          y.*
     FROM months x
LEFT JOIN your_table y ON MONTH(y.date) = x.month_number

After all, the last time KM & chatted about this - we found a recursive CTE to be slightly more efficient than using a numbers table.

OMG Ponies 2010-06-25 18:13:46

Answer 5

+2 A:

I like this approach to build the months table:

SELECT 
  DATENAME(mm, date_val) AS month_name,  
  MONTH(date_val) AS month_number,  
  date_val as dt
FROM ( 
  SELECT DATEADD(mm, number, '2010-01-01') AS date_val
  FROM master.dbo.spt_values
  WHERE type = 'P'
  AND number BETWEEN 0 AND 11
) months

Based on my tests, it's faster than a CTE. I'm running SQL Server 2008 Express.

Here are the test results, using SET STATISTICS IO ON and SET STATISTICS TIME ON

CTE:

(12 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 64 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Subquery:

(12 row(s) affected)
Table 'spt_values'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Although your original question is asking what this is called. I don't know a name for it; maybe something like "left outer join against a series?"

One additional part to add: when you join against the months table, or even when do your original query, it's typically recommended to avoid using a function like YEAR([timestamp]) on the left side of your WHERE clause.

So this code:

SELECT                     
    YEAR([timestamp]),                     
    MONTH([timestamp]),                     
    COUNT(*)                     
FROM table1                     
WHERE YEAR([timestamp]) = YEAR(GETDATE())                     
GROUP BY                     
    YEAR([timestamp]),                     
    MONTH([timestamp])

...will cause an index scan (assuming timestamp is indexed) because of the YEAR([timestamp]) must be evaluted for every row. On a 1m+ row table, this will mean poor performance.

So you will typically see a recommendation like this instead:

SELECT                     
    YEAR([timestamp]),                     
    MONTH([timestamp]),                     
    COUNT(*)                     
FROM #table1                     
WHERE [timestamp] >= DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0) -- First day of this year
AND   [timestamp] < DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) + 1, 0) -- First day of next year
GROUP BY                     
    YEAR([timestamp]),                     
    MONTH([timestamp])

This will use an index seek (again, assuming timestamp is an indexed column) and result in fewer logical reads and thus a faster response. This can be confirmed by checking the execution plan.

8kb 2010-06-25 20:28:17

Thanks 8kb! What about creating a memory table by looping through whatever integers or other values you need and left joining on that? The memory table approach seems more flexible, and you always know what's in the table. One thing I don't like about the spt_values and recursive CTE based solutions is the limit. There's no limit on a memory table.

Kuyenda 2010-06-28 21:22:37

In my experience, a Numbers table or Tally table is a typically considered a best practice so I would agree. But in terms of limitations, with a CTE you can set the MAXRECURSION level up to 32676 which seems more than adequate for dates. I used MAXRECURSION 2000 on OMG's query and was able to get 100 years worth of months. But for spt_values, it's true you're limited to 2048 rows.

8kb 2010-06-28 22:09:45

ansaurus

tags:

views:

answers:

How can I create a series of months to join sparse data to?

related questions