views:

118

answers:

3

I've read the answers to this question and they are quite helpful, but I need help particularly in R.

I have an example data set in R as follows:

x <- c(32,64,96,118,126,144,152.5,158)  
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)

I want to fit a model to these data so that y = f(x). I want it to be a 3rd order polynomial model.

How can I do that in R?

Additionally, can R help me to find the best fitting model?

+2  A: 

To get a third order polynomial in x (x^3), you can do

lm(y ~ x + I(x^2) + I(x^3))

or

lm(y ~ poly(x, 3, raw=TRUE))

You could fit a 10th order polynomial and get a near-perfect fit, but should you?

EDIT: poly(x, 3) is probably a better choice (see @hadley below).

Greg
@greg is spot on in asking "should you". The sample data only has 8 points. Degrees of freedom are pretty low here. The real life data may have a lot more, of course.
JD Long
Thanks for your answer. What about getting R to find the best fitting model? Are there any functions for this?
Mehper C. Palavuzlar
It depends on your definition of "best model". The model that gives you the greatest R^2 (which a 10th order polynomial would) is not necessarily the "best" model. The terms in your model need to be reasonably chosen. You can get a near-perfect fit with a lot of parameters but the model will have no predictive power and will be useless for anything other than drawing a best fit line through the points.
Greg
Why are you using `raw = T`? It's better to use uncorrelated variables.
hadley
I did it to get the same results as `lm(y ~ x + I(x^2) + I(x^3))`. Perhaps not optimal, just giving two means to the same end.
Greg
A: 

Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:

polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)

Notes

  • The validity of this approach will depend on your objectives, the assumptions of optimize() and AIC() and if AIC is the criterion that you want to use,

  • polyfit() may not have a single minimum. check this with something like:

    for (i in 2:length(x)-1) print(polyfit(i))
    
  • I used the as.integer() function because it is not clear to me how I would interpret a non-integer polynomial.

  • for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here

David
+1  A: 

Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:

x <- 1:10
y <- x + c(-0.5,0.5)

plot(x,y, xlim=c(0,11), ylim=c(-1,12))

fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )

fit7 <- lm( y ~ x + cos(x*pi) )

xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')

Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).

Greg Snow