tags:

views:

1593

answers:

3

I may be Doing It Wrong™ but I have a dataframe and for each row in that dataframe I have to do some complicated lookups and append some data to a file.

In my procedural world, I'd do something like:

for (row in dataFrame) {
    #look up stuff using data from the row
    #write stuff to the file
}

what is the R way to do this?

Update with more information:

the dataFrame contains scientific results for selected wells from 96 well plates used in biological research so I want to do something like:

for (well in dataFrame) {
  wellName <- well$name    # string like "H1"
  plateName <- well$plate  # string like "plate67"
  wellID <- getWellID(wellName, plateName)
  cat(paste(wellID, well$value1, well$value2, sep=","), file=outputFile)
}
+4  A: 

You can use the by function:

by(dataFrame, 1:nrow(dataFrame), function(row) dostuff)

But iterating over the rows directly like this is rarely what you want to; you should try to vectorize instead. Can I ask what the actual work in the loop is doing?

Jonathan Chang
updated question w/ more info. thanks!
Carl Coryell-Martin
+6  A: 

You can try this, using apply() function

> d
  name plate value1 value2
1    A    P1      1    100
2    B    P2      2    200
3    C    P3      3    300

> f <- function(x, output) {
 wellName <- x[1]
 plateName <- x[2]
 wellID <- 1
 print(paste(wellID, x[3], x[4], sep=","))
 cat(paste(wellID, x[3], x[4], sep=","), file= output, append = T, fill = T)
}

> apply(d, 1, f, output = 'outputfile')
knguyen
this was a very clear explanation thank you.
Carl Coryell-Martin
+3  A: 

First, Jonathan's point about vectorizing is correct. If your getWellID() function is vectorized, then you can skip the loop and just use cat or write.csv:

write.csv(data.frame(wellid=getWellID(well$name, well$plate), 
         value1=well$value1, value2=well$value2), file=outputFile)

If getWellID() isn't vectorized, then Jonathan's recommendation of using by or knguyen's suggestion of apply should work.

Otherwise, if you really want to use for, you can do something like this:

for(i in 1:nrow(dataFrame)) {
    row <- dataFrame[i,]
    # do stuff with row
}

You can also try to use the foreach package, although it requires you to become familiar with that syntax. Here's a simple example:

library(foreach)
d <- data.frame(x=1:10, y=rnorm(10))
s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d

A final option is to use a function out of the plyr package, in which case the convention will be very similar to the apply function.

library(plyr)
ddply(dataFrame, .(x), function(x) { # do stuff })
Shane
Shane, thank you. I'm not sure how to write a vectorized getWellID. What I need to do right now is to dig into an existing list of lists to look it up or pull it out of a database.
Carl Coryell-Martin
Feel free to post the getWellID question (i.e. can this function be vectorized?) separately, and I'm sure I (or someone else) will answer it.
Shane
Even if getWellID is not vectorized, I think you should go with this solution, and replace getWellId with `mapply(getWellId, well$name, well$plate)`.
Jonathan Chang
Even if you pull it from a database, you can pull them all at once and then filter the result in R; that will be faster than an iterative function.
Shane