tags:

views:

68

answers:

2

Can someone explain why levels() shows three factor levels, while you can see that the vector has only two?

> str(walk.df)
'data.frame':   10 obs. of  4 variables:
 $ walker : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2

> walk.df$walker
 [1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2 3

I would like to extract a vector of levels, and I thought this was the proper way, but as you can see, a three sneaks in there which is messing up my function.

> as.numeric(levels(walk.df$walker))
[1] 1 2 3
+5  A: 

probably walk.df is a subset of the factor variable with 3 levels. say,

a<-factor(1:3)
b<-a[1:2]

then b has 3 levels.

A easy way to drop extra level is:

b<-a[1:2, drop=T]

or if you cannot access the original variable,

b<-factor(b)
kohske
You are correct, it is a subset. I've been mincing the object so hard that I forgot I'm sampling from the raw object - which has three levels.
Roman Luštrik
A: 

You can assign several factor levels to a factor that contains two levels:

 > set.seed(1234)
 > x <- round(runif(10, 1, 2))
 > x
  [1] 1 2 2 2 2 2 1 1 2 2
 > y <- factor(x)
 > levels(y)
 [1] "1" "2"
 > levels(y) <- c("1", "2", "3")
 > y
  [1] 1 2 2 2 2 2 1 1 2 2
 Levels: 1 2 3

or even no levels at all:

 > p <- NA
 > q <- factor(p)
 > levels(q)
 character(0)
 > levels(q) <- c("1", "2", "3")
 > q
 [1] <NA>
 Levels: 1 2 3
aL3xa
What I really wanted was extract the levels that appear in the subset. I have solved this with list.of.walkers <- sort(unique(walk.df$label)).
Roman Luštrik