views:

4903

answers:

5

Hi,

I'm sure this is the kind of problem other have solved many times before.

A group of people are going to do measurements (Home energy usage to be exact). All of them will do that at different times and in different intervals.

So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.

What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated). I expect that a simple linear interpolation would suffice for this project.

If I assume that it must be done in Excel. What is the best way to interpolate in such a dataset (so I have a value for every day) ?

Thanks.

NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.

ADDITIONAL INFO After first few suggestions: I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!). I'm looking for something (existing) automatic to do that for me. So if my input is

{2009-06-01,  10}
{2009-06-03,  20}
{2009-06-06, 110}

Then I expect to automatically get

{2009-06-01,  10}
{2009-06-02,  15}
{2009-06-03,  20}
{2009-06-04,  50}
{2009-06-05,  80}
{2009-06-06, 110}

Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.

+1  A: 

There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.

Bill the Lizard
Thanks for the tips.I tried these two functions and apparently both of these functions plot a single straight line through all data points.That's not what I was looking for. My primary requirement is that the measured points remain as-is. These functions 'break' this requirement.
Niels Basjes
Use just two points for your known inputs to do the linear interpolation. So if you have measurements at 8:30, 9:00, 10:00, 10:30... and you want to estimate the measurement at 9:30, you'd only need the 9:00 and 10:00 measurements in the LINEST function, not the entire set. Do this for each data point you need an estimate for, using the two nearest bounding points.
Bill the Lizard
Perhaps I misinterpret your suggestion but to me this implies that I manually determine "Where the holes are". I'm a developer, inherently lazy, I want the software to figure that out for me.
Niels Basjes
A: 

A nice graphical way to see how well your interpolated results fit:

Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.

Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.

Stewbob
+3  A: 
Deniss
Thank you. I expected this to be a standard part of Excel.This does what I need.
Niels Basjes
+1  A: 

Hi,

I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.

My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).

alt text

Things to note:

  • The formula works by:

    • using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
    • using OFFSETs to select the square of that line and the next (in light purple)
    • using FORECAST to build a linear interpolation using just those two points, and getting the result
  • This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).

Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).

If you want to copy-paste the formula, it is:

=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1

(inputs being a named range)

/YGA

YGA
A: 

alternatively.

=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))

where j7 is the x value.

xvals is range of x values yvals is range of y values

easier to put this into code.


[email protected]

darren