views:

564

answers:

4

Running into a problem.

I have a table defined to hold the values of the daily treasury yield curve.

It's a pretty simple table used for historical lookup of values.

There are notibly some gaps in the table on year 4, 6, 8, 9, 11-19 and 21-29.

The formula is pretty simple in that to calculate year 4 it's 0.5*Year3Value + 0.5*Year5Value.

The problem is how can I write a VIEW that can return the missing years?

I could probably do it in a stored procedure but the end result needs to be a view.

A: 
WITh cal(year) AS
        (
        SELECT  1 AS current_year
        UNION ALL
        SELECT  year + 1
        FROM    cal
        WHERE   year < 100
        )
SELECT  CASE WHEN yield_year IS NULL THEN
             0.5 *
             (
             SELECT  TOP 1 yield_value
             FROM    yield
             WHERE   yield_year < year
             ORDER BY
                     yield_year DESC
             ) +
             0.5 *
             (
             SELECT  TOP 1 yield_value
             FROM    yield
             WHERE   yield_year > year
             ORDER BY
                     yield_year ASC
             )
         ELSE
             yield_value
         END
FROM     cal
LEFT JOIN
         yield
ON       yield_year = year

For missing years this query takes the average of closest years found.

Quassnoi
I believe the 'cal' CTE will break because of recursion limit.
van
@van: sure, forgot the limiter. Corrected.
Quassnoi
+1  A: 

You could try unpivot to get the years & values in a list.

Then union this to the missing years select YearNo , (select YearValue where YearNo = YearNo-1) * 0.5 + (select YearValue where YearNo = YearNo+1) * 0.5 AS YearValue from unpivotedlist where YearNo in (our missing list of years)

Then pivot it back up again to get the format you require and pop it in a view?

BeckyLou
+1  A: 

I'm going to make the guess that you want the curve to smoothly move between two years if there is a gap, so if more than one year is missing you don't want to just average the two closest years. Here's what I would probably use:

SELECT
     NUM.number AS year,
     COALESCE(YC.val, YC_BOT.val + ((NUM.number - YC_BOT.yr) * ((YC_TOP.val - YC_BOT.val)/(YC_TOP.yr - YC_BOT.yr))))
FROM
     dbo.Numbers NUM
LEFT OUTER JOIN dbo.Yield_Curve YC ON
     YC.yr = NUM.number
LEFT OUTER JOIN dbo.Yield_Curve YC_TOP ON
     YC.yr IS NULL AND       -- Only join if we couldn't find a current year value
     YC_TOP.yr > NUM.number
LEFT OUTER JOIN dbo.Yield_Curve YC_TOP2 ON
     YC_TOP2.yr > NUM.number AND
     YC_TOP2.yr < YC_TOP.yr
LEFT OUTER JOIN dbo.Yield_Curve YC_BOT ON
     YC.yr IS NULL AND       -- Only join if we couldn't find a current year value
     YC_BOT.yr < NUM.number
LEFT OUTER JOIN dbo.Yield_Curve YC_BOT2 ON
     YC_BOT2.yr < NUM.number AND
     YC_BOT2.yr > YC_BOT.yr
WHERE
     YC_TOP2.yr IS NULL AND
     YC_BOT2.yr IS NULL AND
     NUM.number BETWEEN @low_yr AND @high_yr

You could rewrite this using a CTE instead of the Numbers table (just a table of consecutive numbers). You could also use NOT EXISTS or subqueries with MIN and MAX instead of the LEFT OUTER JOINs on YC_BOT2 and YC_TOP2 if you wanted to do that. Some people find this method confusing.

Tom H.
+5  A: 

Taking the assumption by Tom H. that what you really want is a linear interpolation and the fact that not just years, but also months are missing, you need to base every calculation on MONTH, not YEAR.

For the code below I assume that you have 2 tables (one of which can be computed as part of the view):

  • Yield: contains real data and stored PeriodM in number-of-month rather then name. If you store PeriodName there, you would just need to join on the table:
  • Period (can be computed in the view like shown): stores period name and number of months it represents

Following code must work (you just need to create a view based on it):

WITH "Period" (PeriodM, PeriodName) AS (
    -- // I would store it as another table basically, but having it as part of the view would do
                SELECT  01, '1 mo'
    UNION ALL   SELECT  02, '2 mo' -- // data not stored
    UNION ALL   SELECT  03, '3 mo'
    UNION ALL   SELECT  06, '6 mo'
    UNION ALL   SELECT  12, '1 yr'
    UNION ALL   SELECT  24, '2 yr'
    UNION ALL   SELECT  36, '3 yr'
    UNION ALL   SELECT  48, '4 yr' -- // data not stored
    UNION ALL   SELECT  60, '5 yr'
    UNION ALL   SELECT  72, '6 yr' -- // data not stored
    UNION ALL   SELECT  84, '7 yr'
    UNION ALL   SELECT  96, '8 yr' -- // data not stored
    UNION ALL   SELECT 108, '9 yr' -- // data not stored
    UNION ALL   SELECT 120, '10 yr'
    -- ... // add more
    UNION ALL   SELECT 240, '20 yr'
    -- ... // add more
    UNION ALL   SELECT 360, '30 yr'
)
, "Yield" (ID, PeriodM, Date, Value) AS (
    -- // ** This is the TABLE your data is stored in **
    -- // 
    -- // value of ID column is not important, but it must be unique (you may have your PK)
    -- // ... it is used for a Tie-Breaker type of JOIN in the view
    -- //
    -- // This is just a test data:
                SELECT 101, 01 /* '1 mo'*/, '2009-05-01', 0.06
    UNION ALL   SELECT 102, 03 /* '3 mo'*/, '2009-05-01', 0.16
    UNION ALL   SELECT 103, 06 /* '6 mo'*/, '2009-05-01', 0.31
    UNION ALL   SELECT 104, 12 /* '1 yr'*/, '2009-05-01', 0.49
    UNION ALL   SELECT 105, 24 /* '2 yr'*/, '2009-05-01', 0.92
    UNION ALL   SELECT 346, 36 /* '3 yr'*/, '2009-05-01', 1.39
    UNION ALL   SELECT 237, 60 /* '5 yr'*/, '2009-05-01', 2.03
    UNION ALL   SELECT 238, 84 /* '7 yr'*/, '2009-05-01', 2.72
    UNION ALL   SELECT 239,120 /*'10 yr'*/, '2009-05-01', 3.21
    UNION ALL   SELECT 240,240 /*'20 yr'*/, '2009-05-01', 4.14
    UNION ALL   SELECT 250,360 /*'30 yr'*/, '2009-05-01', 4.09
)
, "ReportingDate" ("Date") AS (
    -- // this should be a part of the view (or a separate table)
    SELECT DISTINCT Date FROM "Yield"
)

-- // This is the Final VIEW that you want given the data structure as above
SELECT      d.Date, p.PeriodName, --//p.PeriodM,
            CAST(
                COALESCE(y_curr.Value,
                    (   (p.PeriodM - y_prev.PeriodM) * y_prev.Value
                    +   (y_next.PeriodM - p.PeriodM) * y_next.Value
                    ) / (y_next.PeriodM - y_prev.PeriodM)
                ) AS DECIMAL(9,4) -- // TODO: cast to your type if not FLOAT
            )  AS Value
FROM        "Period" p
CROSS JOIN  "ReportingDate" d
LEFT JOIN   "Yield" y_curr
        ON  y_curr.Date = d.Date
        AND y_curr.PeriodM = p.PeriodM
LEFT JOIN   "Yield" y_prev
        ON  y_prev.ID = (SELECT TOP 1 y.ID FROM Yield y WHERE y.Date = d.Date AND y.PeriodM <= p.PeriodM ORDER BY y.PeriodM DESC)
LEFT JOIN   "Yield" y_next
        ON  y_next.ID = (SELECT TOP 1 y.ID FROM Yield y WHERE y.Date = d.Date AND y.PeriodM >= p.PeriodM ORDER BY y.PeriodM ASC)

--//WHERE       d.Date = '2009-05-01'
van
+1 I think that this will actually work for me, many thanks!
SomeMiscGuy
So I was looking at the results set and its seeming more like a sine wave than a curve... Looking at the original formula folks gave me the 0.5 doesn't seem to work well for more than a 1yr gap, almost like for 11-19 I should calc 15 with 0.5 and then use a series to calc above and below that so year 11 is 0.8*[10yr]+0.2*(0.5*[10yr]+0.5*[20yr])--() is Year 15 and so on, with .6 and .4 for year 12, etc. Going to try it out
SomeMiscGuy
@Christopher Klein: 1) What is the original formula? 2) if you do 15' first, and then do 11' as you described using 10 and 15', then you will end up with the same number as if you used 10 and 20 anyways.
van
Year15; 0.5*[10yr]+0.5*[20yr]Year11; 0.8*[10yr]+0.2*[15yr]Year12; 0.6*[10yr]+0.4*[15yr]Year13; 0.4*[10yr]+0.6*[15yr]Year14; 0.2*[10yr]+0.8*[15yr]Year16; 0.8*[15yr]+0.2*[20yr]etc,etcthe way the code is going for Year11 its seeing the next 'period' s being Year 20 so the value jump between 10 and 11 goes 3.21 to 4.04 and then back to 3.95 for year 12, when it should go 3.210,3.303,3.396,3.489 and 3.675(Year15). Someone dug up an excel spreadsheet that they used a few years ago and this is how it played out.
SomeMiscGuy
@Christopher Klein: Chris. What you are describing is still a 'linear interpolation' and based on the same 'linear' weights, which will produce the same fitting line. Therefore in both cases [11y] will be the same. Lets see old caluation: [11y-old] = 9/10*[10y] + 1/10[20y]. Now lets see [11y-new] = 4/5*[10y] + 1/5*[15y], and lets substitute [15y] with you formula and we get: [11y-new] = 4/5*[10y] + 1/5*(5/10*[10y] + 5/10*[20y]) = 4/5*[10y] + 1/10*[10y] + 1/10*[20y] = 9/10*[10y] + 1/10[20y].As you can see, we get the same value :-)And it will be the same for other data points as well.
van
@Christopher Klein: ... and rough approximation is not good enough for your -customers-, then the 'right' way to do it would be to use spline to fit your yield curve. But I would not dare doing it in SQL. What you can do is to create a method to compute the whole spline and then store it either as the whole function for that date or just the values for the datapoints you care about. You would run this routine just after you downloaded your new data, and I would store the data in another table. then you just create a view to basically 'select *' from. Google for "cubic spline yield curve".
van