ansaurus

Question

Answer 1

+3 A:

One approach is to use the reshape package to create a data.frame with years in columns and names in rows:

library(reshape)
cast(d, name ~ year, value = "numbers")

You could then use complete.cases to extract the rows of interest.

hadley 2009-09-06 15:28:50

Thanks hadley! I was looking for a method that didn't involve casting and melting back and forth. I should have made this explicit. Thanks anyway!

Andreas 2009-09-06 20:49:23

Answer 2

+2 A:

If there is only one record per year, just count up the number of times each person appears in the dataset:

counts <- as.data.frame(table(name = d$name))

Then look for everyone who appeared twice:

subset(counts, Freq == 2)

hadley 2009-09-06 15:31:17

That was actually the case. But I would still need to subset d with count$name-or something like that.

Andreas 2009-09-06 21:02:27

Yeah, but I figured you could work that out yourself ;)

hadley 2009-09-07 01:47:19

Yes - %in% is my new friend :-)

Andreas 2009-09-07 08:37:02

Answer 3

+1 A:

Here's another solution that uses just base R and doesn't make any assumptions about the number of records a person has per year:

d <- data.frame(cbind(numbers = rnorm(10), 
                      year = rep(c(2008, 2009), 5),
                      name = c("john", "David", "Tom", "Kristin",
                               "Lisa","Eve","David","Tom","Kristin",
                               "Lisa")))
# split data into 2 data.frames (1 for each year)
by.year <- split(d, d$year, drop=T)

# find the names that appear in both years
keep <- intersect(by.year[['2008']]$name, by.year[['2009']]$name)
# Or, if you had several years, use Reduce as a more general solution:
keep <- Reduce(intersect, lapply(by.year, '[[', 'name'))

# show the rows of the original dataset only if their $name field
# is in our 'keep' vector
d[d$name %in% keep,]

Steve Lianoglou 2009-09-06 16:26:30

Thanks a lot steve. I suspect Reduce will be very usefull for me. Didn't know about it.

Andreas 2009-09-07 08:38:51

Answer 4

+11 A:

Simple way:

subset(
    d,
    name %in% intersect(name[year==2008], name[year==2009])
)

Marek 2009-09-06 17:43:09

Brilliant - didn't know about intersect or %in%. Thanks so much!!!

Andreas 2009-09-06 20:55:51

ansaurus

tags:

views:

answers:

data.frame subset long format

related questions