views:

260

answers:

4

I am using the stats package R and would like to loop through column[x] in all the rows of a dataframe, operate on the data in each cell in the column with a function and pass the result to a new column (with the calculated result in the new column aligned with the data in column[x])

Two problems. 1) I can't get it to work and 2) looping seems to be discouraged in the R articles I've read. Is there an alternative approach and if not, does anyone have an example of how to carry out the loop? thanks.

+1  A: 

Looping is not necessarily discouraged.

Get it working first, and only then think about getting if faster.

Dirk Eddelbuettel
+2  A: 

Without any examples, it's hard to know how to respond. The basic case of what you're describing, however, is this:

#Just a very simple data frame
dat <- data.frame(x = c(1, 2, 3))
#Compute the squared value of each value in x
dat$y <- dat$x^2
#See the resultant data.frame, now with column y
dat

When you tell R to square a vector (or vector-like structure, like dat$x), it knows to square each value separately. You don't need to explicitly loop over those values most of the time - although, as Dirk notes, you should only worry about optimizing your loops if they are causing you problems. That said, I certainly prefer reading and writing

dat$y <- dat$x^2

to:

for(i in 1:length(dat$x)){
  dat$y[i] <- dat$x[i]^2
}

... where possible.

Matt Parker
Thanks. I can get arithmetic working OK. I'm not able to pass the contents of a dataframe to a function. Here's the problem. Here's the top of the frame (called data) with headings "compound" and "SMILES" (smiles are a text representation of a molecule)Compound_ID SMILES12345 c1cccccc1I want to use the function parse.smiles() to do read in the smiles and output a molecule. If I do it on one molecule its OK (junk <- "c1ccccc1", parse.smiles(junk)If I do sp <- get.smiles.parser()junk <- sapply(data$smiles, parse.smiles, parser = sp)it can't interpret smiles
Andy
Okay. Sorry, wasn't quite sure where you were in R, so I thought I'd just throw the basic case out there. "unknown" might have it - but if not, your best bet is to post a little sample data set and the function. Kind of hard to grasp what's going wrong from a description, and I, at least, can't get any *apply functions right without experimentation.
Matt Parker
A: 

if parse.smiles() is a function you want to apply to all the entry of a vector "vec", then you can use:

lapply(1:length(vec),parse.smiles(vec[i]))
Thanks everyone. The column I was interested in had been read in as a factor. I had to be explicitly tell the function to read in the contents as characters and I had not realised this until pointed out. It's now working.(and apologies for garbled comment above - It was formatted with returns which disappeared when posted)
Andy
+1  A: 

The only reason looping is discouraged is that it is slow. R is designed to work on vectors at a time and has lots of functions to accomplish this. The whole apply family, as well as functions like Vectorize to help out. So the idiom is that if your using for loops you're not thinking in R, but sometimes for loops really are just appropriate.

To do this in the R way of thinking, Vectorize your function, if it is not already vectorized (see the Vectorize function) then call that function with the entire column as an argument and assign that to the new column.

f<-Vectorize(function(x,...),'x')
data$newcolumn<-f(data[,1])

The apply family (apply, sapply, lapply, mapply, tapply) are also alternatives. Most native R functions are already vectorized, but be careful when passing extra arguments that are supposed to be interpreted as vectors.

Andrew Redd