tags:

views:

694

answers:

2

Suppose I have a dataframe like this one:

df <- data.frame (id = c("a", "b", "a", "c", "e", "d", "e"), n=1:7)

and a vector with ids like this one:

v <- c("a", "b")

How can I select the rows of the dataframe that match the ids in v? I can't use the id column for rownames because they are not unique. When I try that, I get:

 rownames(df) <- df[["id"]]
Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 1L, 3L, 5L,  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘a’, ‘e’ 
+6  A: 

Use

df[df$id %in% v,]
Shane
+4  A: 

This should do what you want:

ndx = which(df$id %in% v)
df[ndx,]
doug
Beat you by 30 seconds. :)
Shane
clearly what's needed on SO is a handicap clock for the experts, say 45 seconds or so your answers to sit on the server before posting--though most of the time even even that won't help me. :)
doug
awesome. +1 for both
amarillion
I'm just waiting for marek to come by and tell us that we're forgetting about NA values...
Shane
In practice, to deal with na values in v and in the id column, I used this: df[df$id %in% v[!is.na(v)],].
amarillion
Here I come... Actually `%in%` is doing well with `NA`. It will match `NA` as any other value (no mater of which of two vectors contains `NA's`). In other words `NA %in% NA` returns `TRUE`.
Marek
@Marek: ha! I was starting to get worried.
Shane