views:

106

answers:

4

Somehow I can´t find it in my notes... nor do find the obivous on the net. How can I tell R to use a certain level as reference if I use dummy explanatories in a regression? It´s just using some level by default.

lm(x ~ y + as.factor(b)) 

with b {0,1,2,3,4} . Let´s say I want to use 3 instead of the zero that is used by R.

Thx in advance !

+4  A: 

See the relevel() function. Here is an example:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

Now alter the factor b in DF by use of the relevel() function:

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

The models have estimated different reference levels.

> coef(m1)
(Intercept)           x          b2          b3          b4          b5 
  3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 
> coef(m2)
(Intercept)           x          b1          b2          b4          b5 
 3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759
Gavin Simpson
@ucfagls : Thx for the pointer on relevel(). I wasn't aware such a function existed.
Joris Meys
+1  A: 

The relevel() command is a shorthand method to your question. What it does is reorder the factor so that whatever is the ref level is first. Therefore, reordering your factor levels will also have the same effect but gives you more control. Perhaps you wanted to have levels 3,4,0,1,2. In that case...

bFactor <- factor(b, levels = c(3,4,0,1,2))

I prefer this method because it's easier for me to see in my code not only what the reference was but the position of the other values as well (rather than having to look at the results for that).

NOTE: DO NOT make it an ordered factor. A factor with a specified order and an ordered factor are not the same thing. lm() may start to think you want polynomial contrasts if you do that.

John
Polynomial contrasts, not a polynomial regression.
hadley
oops, fixed... you delete and I will :)
John
+2  A: 

Others have mentioned the relevel command which is the best solution if you want to change the base level for all analyses on your data (or are willing to live with changing the data).

If you don't want to change the data (this is a one time change, but in the future you want the default behavior again), then you can use a combination of the C (note uppercase) function to set contrasts and the contr.treatments function with the base argument for choosing which level you want to be the baseline. For example:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )
Greg Snow
+1  A: 

You can also manually tag the column with a contrasts attribute, which seems to be respected by the regression functions:

contrasts(df$factorcol) <- contr.treatment(levels(df$factorcol),
   base=which(levels(df$factorcol) == 'RefLevel'))
Harlan