ansaurus

Question

Applying a function on each row of a data frame in R

Answer 1

A:

It sounds like you want to use subset:

subset(orig.df,grepl("ave",name))

The second argument evaluates to a logical expression that determines which rows are kept. You can make this expression use values from as many columns as you want, eg grepl("ave",name) & size>50

James 2010-09-06 11:08:21

Answer 2

+1 A:

You may have to use lapply instead of apply to force the result to be a list.

> rhymesWithBrave <- function(x) substring(x,nchar(x)-2) =="ave"
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+                      if(rhymesWithBrave(dfr[i,"name"])) dfr[i,] else NULL,
+                      dfr))
  id size name
1  1  100 dave

But in this case, subset would be more appropriate:

> subset(dfr,rhymesWithBrave(name))
  id size name
1  1  100 dave

If you want to perform additional transformations before returning the result, you can go back to the lapply approach above:

> add100tosize <- function(x) within(x,size <- size+100)
> do.call(rbind,lapply(1:nrow(dfr),function(i,dfr)
+                      if(rhymesWithBrave(dfr[i,"name"])) add100tosize(dfr[i,])
+                      else NULL,dfr))
  id size name
1  1  200 dave

Or, in this simple case, apply the function to the output of subset.

> add100tosize(subset(dfr,rhymesWithBrave(name)))
  id size name
1  1  200 dave

UPDATE:

To select rows that do not fall between start and end, you might construct a different function (note: when summing result of boolean/logical vectors, TRUE values are converted to 1s and FALSE values are converted to 0s)

test <- function(x)
  rowSums(mapply(function(start,end,x) x >= start & x <= end,
                 start=c(100,250,698,1988),
                 end=c(200,400,1520,2147))) == 0

subset(dfr,test(size))

Stephen 2010-09-06 11:20:33

+1 Thanks, please see update.

David B 2010-09-06 11:41:28

Answer 3

+1 A:

For the more general case of processing a dataframe, get the plyr package from CRAN and look at the ddply function, for example.

install.packages(plyr)
library(plyr)
help(ddply)

Does what you want without masses of fiddling.

For example...

> d
    x          y           z xx
1   1 0.68434946 0.643786918  8
2   2 0.64429292 0.231382912  5
3   3 0.15106083 0.307459540  3
4   4 0.65725669 0.553340712  5
5   5 0.02981373 0.736611949  4
6   6 0.83895251 0.845043443  4
7   7 0.22788855 0.606439470  4
8   8 0.88663285 0.048965094  9
9   9 0.44768780 0.009275935  9
10 10 0.23954606 0.356021488  4

We want to compute the mean and sd of x within groups defined by "xx":

> ddply(d,"xx",function(r){data.frame(mean=mean(r$x),sd=sd(r$x))})
  xx mean        sd
1  3  3.0        NA
2  4  7.0 2.1602469
3  5  3.0 1.4142136
4  8  1.0        NA
5  9  8.5 0.7071068

And it gracefully handles all the nasty edge cases that sometimes catch you out.

Spacedman 2010-09-06 11:36:34

could you explain how to use it? AFAICT, it works on columns, not rows.

David B 2010-09-06 11:47:14

There's lots of documentation for plyr available from the help in the package itself or elsewhere. The ddply function takes a dataframe, a grouping variable, and a function; it splits the dataframe by the grouping variable and calls the function with each split. The result is then made back into a data frame.

Spacedman 2010-09-06 13:06:37

the help is actually very short. how can I split the dataframe into rows? do I have to add a dummy column with unique id?

David B 2010-09-06 13:19:41

ansaurus

tags:

views:

answers:

Applying a function on each row of a data frame in R

related questions