tags:

views:

314

answers:

2

I'm trying to write a function to accept a data.frame (x) and a column from it. The function performs some calculations on x and later returns another data.frame. I'm stuck on the best-practices method to pass the column name to the function.

The two minimal examples fun1 and fun2 below produce the desired result, being able to perform operations on x$column, using max() as an example. However, both rely on the seemingly (at least to me) inelegant (1) call to substitute() and possibly eval() and (2) the need to pass the column name as a character vector.

fun1 <- function(x, column){
  do.call("max", list(substitute(x[a], list(a = column))))
}

fun2 <- function(x, column){
  max(eval((substitute(x[a], list(a = column)))))
}

df <- data.frame(B = rnorm(10))
fun1(df, "B")
fun2(df, "B")

I would like to be able to call the function as fun(df, B), for example. Other options I have considered but have not tried:

  • Pass column as an integer of the column number. I think this would avoid substitute(). Ideally, the function could accept either.
  • with(x, get(column)), but, even if it works, I think this would still require substitute
  • Make use of formula() and match.call(), neither of which I have much experience with.

Subquestion: Is do.call() preferred over eval()?

Thanks, Kevin

+3  A: 

You can just use the column name directly:

df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
  max(x[,column])
}
fun1(df, "B")
fun1(df, c("B","A"))

There's no need to use substitute, eval, etc.

You can even pass the desired function as a parameter:

fun1 <- function(x, column, fn) {
  fn(x[,column])
}
fun1(df, "B", max)
Shane
Is there any way to pass the column name not as a string?
Kevin
You need to either pass the column name quoted as a character or the integer index for the column. Just passing `B` will assume that B is an object itself.
Shane
I see. I'm not sure how I ended up with the convoluted substitute, eval, etc.
Kevin
Are you looking for the max value of all the values from a vector "B" that's contained in another vector "A"? Or are you looking to apply a function to a set of columns from a data frame?
Shane
The function actually has nothing to do with max. It was just what came to mind when I was trying to write an example.
Kevin
You could also look on the `subset.data.frame` code and see how is column selection done.
Marek
A: 

Personally I think that passing the column as a string is pretty ugly. I like to do something like:

get.max <- function(column,data=NULL){
    column<-eval(substitute(column),data, parent.frame())
    max(column)
}

which will yield:

> get.max(mpg,mtcars)
[1] 33.9
> get.max(c(1,2,3,4,5))
[1] 5

Notice how the specification of a data.frame is optional. you can even work with functions of your columns:

> get.max(1/mpg,mtcars)
[1] 0.09615385
Ian Fellows
You need to get out of the habit of thinking using quotes is ugly. Not using them is ugly! Why? Because you've created a function that can only be used interactively - it's very difficult to program with it.
hadley
I'm happy to be shown a better way, but I fail to see the difference between this and qplot(x=mpg,data=mtcars). ggplot2 never passes a column as a string, and I think it is better off for it. Why do you say that this can only be used interactively? Under what situation would it lead to undesirable results? How is it more difficult to program with? In the body of the post I display how it is more flexible.
Ian Fellows