views:

485

answers:

1

I was wondering if there is a way to include error terms for a linear regression model like

r = lm(y ~ x1+x2) ?

+1  A: 

The code r = lm(y ~ x1+x2) means we model y as a linear function of x1 and x2. Since the model will not be perfect, there will be a residual term (i.e. the left-over that the model failed to fit).

In maths, as Rob Hyndman noted in the comments, y = a + b1*x1 + b2*x2 + e, where a, b1 and b2 are constants and e is your residual (which is assumed to be normally distributed).

To look at a concrete example, consider the iris data that comes with R.

model1 <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=iris)

Now we can extract the constants from the model (equivalent to a, b1, b2 and in this case b3 too).

> coefficients(model1)
(Intercept)  Sepal.Width Petal.Length  Petal.Width 
1.8559975    0.6508372    0.7091320   -0.5564827

The residuals have been calculated for each row of data that was used in the model.

> residuals(model1)
           1             2             3             4             5       
0.0845842387  0.2100028184 -0.0492514176 -0.2259940935 -0.0804994772
# etc. There are 150 residuals and 150 rows in the iris dataset.


(EDIT: Cut summary info as not relevent.)


EDIT:

The Error value you mention in your comments in explained on the help page to aov.

If the formula contains a single ‘Error’ term, this is used to
specify error strata, and appropriate models are fitted within
each error stratum.

Compare the following (adapted from the ?aov page.)

> utils::data(npk, package="MASS")
> aov(yield ~  N*P*K, npk)
Call:
   aov(formula = yield ~ N * P * K, data = npk)

Terms:
                       N        P        K      N:P      N:K      P:K    N:P:K Residuals
Sum of Squares  189.2817   8.4017  95.2017  21.2817  33.1350   0.4817  37.0017  491.5800
Deg. of Freedom        1        1        1        1        1        1        1        16

Residual standard error: 5.542901 
Estimated effects may be unbalanced

> aov(yield ~  N*P*K + Error(block), npk)
Call:
aov(formula = yield ~ N * P * K + Error(block), data = npk)

Grand Mean: 54.875 

Stratum 1: block

Terms:
                    N:P:K Residuals
Sum of Squares   37.00167 306.29333
Deg. of Freedom         1         4

Residual standard error: 8.750619 
Estimated effects are balanced

Stratum 2: Within

Terms:
                        N         P         K       N:P       N:K       P:K Residuals
Sum of Squares  189.28167   8.40167  95.20167  21.28167  33.13500   0.48167 185.28667
Deg. of Freedom         1         1         1         1         1         1        12

Residual standard error: 3.929447 
1 out of 7 effects not estimable
Estimated effects may be unbalanced
Richie Cotton
Hey Richie, this looks interesting . So I thought I may try this in R : > v = aov(yield ~ N*P*K, npk)>v2 = aov(yield ~ N*P*K + Error(block), npk)> coefficients(v)> coefficients(v2)What type of regression is this ? And what are the coefficients such as N1:P1 and N1:P1:K1 ?
phpdash
Richie Cotton