ansaurus

Question

How to split a data frame by rows, and then process the blocks?

Answer 1

+4 A:

You can use isplit (from the "iterators" package) to create an iterator object that loops over the blocks defined by the site column:

require(iterators)
site.data <- read.table("isplit-data.txt",header=T) 
sites <- isplit(site.data,site.data$site)

Then you can use foreach (from the "foreach" package) to create a plot within each block:

require(foreach)
foreach(site=sites) %dopar% {
 pdf(paste(site$key[[1]],".pdf",sep=""))
 plot(site$value$year,site$value$peak,main=site$key[[1]])
 dev.off()
}

As a bonus, if you have a multiprocessor machine and call registerDoMC() first (from the "doMC" package), the loops will run in parallel, speeding things up. More details in this Revolutions blog post: Block-processing a data frame with isplit

David Smith 2009-09-08 17:29:18

Answer 2

+3 A:

Christopher DuBois 2009-09-08 17:41:31

Answer 3

+1 A:

There are two handy built in functions for dealing with these kind of situations. ?aggregate and ?by. In this case because you want a plot and aren't returning a scalar, use by()

data <- read.table("example.txt",header=TRUE)

by(data[, c('year', 'peak')], data$site, plot)

The output says NULL because that's what plot returns. You might want to set the graphics device to pdf to capture all the output.

Peter 2009-09-08 19:05:00

Answer 4

+2 A:

Here's what I would do, although it looks like you guys have it handled by library functions.

for(i in 1:length(unique(data$site))){
  constrainedData = data[data$site==data$site[i]];
  doSomething(constrainedData);
}

This kind of code is more direct and might be less efficient, but I prefer to be able to read what it is doing than learn some new library function for the same thing. makes this feel more flexible too, but in all honesty this is just the way I figured it out as a novice.

Karl 2009-09-08 19:11:42

Answer 5

A:

I seem to recall that plain old split() has a method for data.frames, so that split(data,data$site) would produce a list of blocks. You could then operate on this list using sapply/lapply/for.

split() is also nice because of unsplit() which will create a vector the same length as the original data and in the correct order.

Jake 2009-09-09 21:06:07

ansaurus

tags:

views:

answers:

How to split a data frame by rows, and then process the blocks?

related questions