ansaurus

Question

Answer 1

+2 A:

To get a third order polynomial in x (x^3), you can do

lm(y ~ x + I(x^2) + I(x^3))

or

lm(y ~ poly(x, 3, raw=TRUE))

You could fit a 10th order polynomial and get a near-perfect fit, but should you?

EDIT: poly(x, 3) is probably a better choice (see @hadley below).

Greg 2010-09-29 14:40:46

@greg is spot on in asking "should you". The sample data only has 8 points. Degrees of freedom are pretty low here. The real life data may have a lot more, of course.

JD Long 2010-09-29 15:14:52

Thanks for your answer. What about getting R to find the best fitting model? Are there any functions for this?

Mehper C. Palavuzlar 2010-09-29 15:40:46

It depends on your definition of "best model". The model that gives you the greatest R^2 (which a 10th order polynomial would) is not necessarily the "best" model. The terms in your model need to be reasonably chosen. You can get a near-perfect fit with a lot of parameters but the model will have no predictive power and will be useless for anything other than drawing a best fit line through the points.

Greg 2010-09-29 17:11:00

Why are you using `raw = T`? It's better to use uncorrelated variables.

hadley 2010-09-29 20:34:21

I did it to get the same results as `lm(y ~ x + I(x^2) + I(x^3))`. Perhaps not optimal, just giving two means to the same end.

Greg 2010-09-29 22:03:45

Answer 2

A:

Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:

polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)

Notes

The validity of this approach will depend on your objectives, the assumptions of optimize() and AIC() and if AIC is the criterion that you want to use,
polyfit() may not have a single minimum. check this with something like:
```
for (i in 2:length(x)-1) print(polyfit(i))
```
I used the as.integer() function because it is not clear to me how I would interpret a non-integer polynomial.
for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here

David 2010-09-29 17:06:47

Answer 3

+1 A:

Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:

x <- 1:10
y <- x + c(-0.5,0.5)

plot(x,y, xlim=c(0,11), ylim=c(-1,12))

fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )

fit7 <- lm( y ~ x + cos(x*pi) )

xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')

Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).

Greg Snow 2010-09-29 17:25:52

ansaurus

tags:

views:

answers:

Fitting polynomial model to data in R

related questions