views:

484

answers:

6

I have a set of transactions occurring at specific points in time:

CREATE TABLE Transactions (
    TransactionDate Date NOT NULL,
    TransactionValue Integer NOT NULL
)

The data might be:

INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('1/1/2009', 1)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('3/1/2009', 2)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('6/1/2009', 3)

Assuming that the TransactionValue sets some kind of level, I need to know what the level was between the transactions. I need this in the context of a set of T-SQL queries, so it would be best if I could get a result set like this:

Month   Value
1/2009  1
2/2009  1
3/2009  2
4/2009  2
5/2009  2
6/2009  3

Note how, for each month, we either get the value specified in the transaction, or we get the most recent non-null value.

My problem is that I have little idea how to do this! I'm only an "intermediate" level SQL Developer, and I don't remember ever seeing anything like this before. Naturally, I could create the data I want in a program, or using cursors, but I'd like to know if there's a better, set-oriented way to do this.

I'm using SQL Server 2008, so if any of the new features will help, I'd like to hear about it.

P.S. If anyone can think of a better way to state this question, or even a better subject line, I'd greatly appreciate it. It took me quite a while to decide that "spread", while lame, was the best I could come up with. "Smear" sounded worse.

+1  A: 

I don't have access to BOL from my phone so this is a rough guide...

First, you need to generate the missing rows for the months you have no data. You can either use a OUTER join to a fixed table or temp table with the timespan you want or from a programmatically created dataset (stored proc or suchlike)

Second, you should look at the new SQL 2008 'analytic' funtions, like MAX(value) OVER ( partition clause ) to get the previous value.

(I KNOW Oracle can do this 'cause I needed it to calculate compounded interest calcs between transaction dates - same problem really)

Hope this points you in the right direction...

(Avoid throwing it into a temp table and cursoring over it. Too crude!!!)

Guy
I've taken a look at this idea, but can't figure out what partition to use.
John Saunders
+1  A: 

here's what i came up with

declare @Transactions table (TransactionDate datetime, TransactionValue int)

declare @MinDate datetime
declare @MaxDate datetime
declare @iDate datetime
declare @Month int
declare @count int
declare @i int
declare @PrevLvl int

insert into @Transactions (TransactionDate, TransactionValue)
select '1/1/09',1

insert into @Transactions (TransactionDate, TransactionValue)
select '3/1/09',2

insert into @Transactions (TransactionDate, TransactionValue)
select '5/1/09',3


select @MinDate = min(TransactionDate) from @Transactions
select @MaxDate = max(TransactionDate) from @Transactions

set @count=datediff(mm,@MinDate,@MaxDate)
set @i=1
set @iDate=@MinDate


while (@i<=@count)
begin

    set @iDate=dateadd(mm,1,@iDate)

    if (select count(*) from @Transactions where TransactionDate=@iDate) < 1
    begin

     select @PrevLvl = TransactionValue from @Transactions where TransactionDate=dateadd(mm,-1,@iDate)

     insert into @Transactions (TransactionDate, TransactionValue)
     select @iDate, @prevLvl

    end


    set @i=@i+1
end

select *
from @Transactions
order by TransactionDate
DForck42
wow, it didn't keep any of my spaces. depressing
DForck42
Which spaces? It looks like they're all present when I click "edit".
John Saunders
Weird. So did my answer help?
DForck42
I looks like it works. I want to play with it more, and with the other answer.
John Saunders
+1  A: 

To do it in a set-based way, you need sets for all of your data or information. In this case there's the overlooked data of "What months are there?" It's very useful to have a "Calendar" table as well as a "Number" table in databases as utility tables.

Here's a solution using one of these methods. The first bit of code sets up your calendar table. You can fill it using a cursor or manually or whatever and you can limit it to whatever date range is needed for your business (back to 1900-01-01 or just back to 1970-01-01 and as far into the future as you want). You can also add any other columns that are useful for your business.

CREATE TABLE dbo.Calendar
(
     date           DATETIME     NOT NULL,
     is_holiday     BIT          NOT NULL,
     CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (date)
)

INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-01', 1)  -- New Year
INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-02', 1)
...

Now, using this table your question becomes trivial:

SELECT
     CAST(MONTH(date) AS VARCHAR) + '/' + CAST(YEAR(date) AS VARCHAR) AS [Month],
     T1.TransactionValue AS [Value]
FROM
     dbo.Calendar C
LEFT OUTER JOIN dbo.Transactions T1 ON
     T1.TransactionDate <= C.date
LEFT OUTER JOIN dbo.Transactions T2 ON
     T2.TransactionDate > T1.TransactionDate AND
     T2.TransactionDate <= C.date
WHERE
     DAY(C.date) = 1 AND
     T2.TransactionDate IS NULL AND
     C.date BETWEEN '2009-01-01' AND '2009-12-31'  -- You can use whatever range you want
Tom H.
Tom, this looks intriguing, but when I tried it, it comes up with duplicate rows - two rows for every input row.
John Saunders
Not sure what to say. I just copied your code and ran it then copied my code and ran it and I got back 12 rows, one for each month. Make sure that the Calendar table doesn't have duplicates. Otherwise, it's pretty much the same approach as the one that John Gibb gave.
Tom H.
Maybe it's just too early in the morning, but I don't see the correspondence between your two left outer joins and his outer apply. Can you show me how they are equivalent, or how they differ?
John Saunders
Sorry, I meant more that they're the same in how they use a utility table. He uses the Numbers for months 1 to 12 and I use the Calendar with the first of each month.
Tom H.
+3  A: 

I'd start by building a Numbers table holding sequential integers from 1 to a million or so. They come in really handy once you get the hang of it.

For example, here is how to get the 1st of every month in 2008:

select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
from Numbers
where n <= 12;

Now, you can put that together using OUTER APPLY to find the most recent transaction for each date like so:

with Dates as (
 select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
 from Numbers
 where n <= 12
)
select d.firstOfMonth, t.TransactionValue
from Dates d
outer apply (
 select top 1 TransactionValue
 from Transactions
 where TransactionDate <= d.firstOfMonth
 order by TransactionDate desc
) t;

This should give you what you're looking for, but you might have to Google around a little to find the best way to create the Numbers table.

John Gibb
This looks like the winner. Thanks, John.
John Saunders
+1  A: 

John Gibb posted a fine answer, already accepted, but I wanted to expand on it a bit to:

  • eliminate the one year limitation,
  • expose the date range in a more explicit manner, and
  • eliminate the need for a separate numbers table.

This slight variation uses a recursive common table expression to establish the set of Dates representing the first of each month on or after from and to dates defined in DateRange. Note the use of the MAXRECURSION option to prevent a stack overflow (!); adjust as necessary to accommodate the maximum number of months expected. Also, consider adding alternative Dates assembly logic to support weeks, quarters, even day-to-day.

with 
DateRange(FromDate, ToDate) as (
  select 
    Cast('11/1/2008' as DateTime), 
    Cast('2/15/2010' as DateTime)
),
Dates(Date) as (
  select 
    Case Day(FromDate) 
      When 1 Then FromDate
      Else DateAdd(month, 1, DateAdd(month, ((Year(FromDate)-1900)*12)+Month(FromDate)-1, 0))
    End
  from DateRange
  union all
  select DateAdd(month, 1, Date)
  from Dates
  where Date < (select ToDate from DateRange)
)
select 
  d.Date, t.TransactionValue
from Dates d
outer apply (
  select top 1 TransactionValue
  from Transactions
  where TransactionDate <= d.Date
  order by TransactionDate desc
) t
option (maxrecursion 120);
ewbi
Recursive CTEs are slow at the time of this writing.
AlexKuznetsov
+1  A: 

If you do this type of analysis often, you might be interested in this SQL Server function I put together for exactly this purpose:

if exists (select * from dbo.sysobjects where name = 'fn_daterange') drop function fn_daterange;
go

create function fn_daterange
   (
   @MinDate as datetime,
   @MaxDate as datetime,
   @intval  as datetime
   )
returns table
--**************************************************************************
-- Procedure: fn_daterange()
--    Author: Ron Savage
--      Date: 12/16/2008
--
-- Description:
-- This function takes a starting and ending date and an interval, then
-- returns a table of all the dates in that range at the specified interval.
--
-- Change History:
-- Date        Init. Description
-- 12/16/2008  RS    Created.
-- **************************************************************************
as
return
   WITH times (startdate, enddate, intervl) AS
      (
      SELECT @MinDate as startdate, @MinDate + @intval - .0000001 as enddate, @intval as intervl
         UNION ALL
      SELECT startdate + intervl as startdate, enddate + intervl as enddate, intervl as intervl
      FROM times
      WHERE startdate + intervl <= @MaxDate
      )
   select startdate, enddate from times;

go

it was an answer to this question, which also has some sample output from it.

Ron

Ron Savage
Ron, thanks. Turns out I didn't need to use it - found a way to not need to know - and besides, I don't work there (or anywhere) at the moment.
John Saunders