views:

115

answers:

4

My program takes a data.frame and crunches the numbers. At one point, values from j-th column are multiplied by a predefined values that depends on the column name (species name, actually - it's en ecological index). So far, I've been providing these values via a second data.frame by matching column names. What would be an efficient way of integrating fixed variable values within a function? I would like my program to be as portable as possible, without the need for a second data.frame file.

EDIT

This is the function. I'm trying to improve the second line (index <- read.table...) so that it would not depend on the outside source.

macroIndex <- function(obj, index) {
    index <- read.table("conv.csv", header=T, dec=",")
    a <- c()
    b <- names(obj)
    for (i in 2:length(obj)) {
        obj[i] <- obj[i] * index[which(index==b[i]), 2]
    }
    obj
}

Another solution I tried, while it may not seem pretty, it gets the job done. I use dput(index) and create a permanent object which I then insert into my function.

A: 

Hi there,

1) consider moving to matrix instead of data.frame - to have faster results.

2) Coudl you supply with some simple code to explain what you want to achieve ?

Tal Galili
I've updated my post and included the function.I suspect that data.frames that will be used for calculating this ecological index will not grow enough to cause a speed issue.
Roman Luštrik
A: 

Well, you need to map your column names to another value, so you have to store it somehow. I would say that a named list would be a more appropriate data structure, although at the end of the day it doesn't make a big difference.

Here's some sample data:

df <- data.frame(a=1:5, b=2:6)
mapping <- list(a=3, b=4)

Here's a simple example of using the list:

for(i in 1:ncol(df)) df[,i] <- df[,i] * mapping[[colnames(df)[i]]]

Regarding Tal's recommendation for using a matrix: that is true so long as every value in your data frame is of the same type. If you have mixed types, then you need to stick with a data frame.

Shane
Without a loop: `df[,] <- lapply(names(df), function(i) df[[i]] * mapping[[i]])`
Marek
A: 

You can use R's lexical scoping to define a function function_maker that returns your desired function func. The code to create the mapping vector is only called when function_maker is called, not when func is. mapping is also owned by func in that other parts of your code can't alter it.

dat <- data.frame(a=c(1,2,3),b=c(3,2,0),c=c(5,6,4))

function_maker <- function(){
    mapping <- c(a=4,b=2,c=5)
    function(df){
        for(i in 1:ncol(df)) df[,i] <- df[,i] * mapping[[colnames(df)[i]]]
        return(df)
    }
}

func <- function_maker()

func(dat)
Ian Fellows
A: 

Why not include the second data frame as a parameter to your function call, and then check if it was given, if not, create it manually, this way the code can work for datasets that match what you do currently, but can be changed to match new datasets.

Something like (sorry I'm not at my PC, so this is untested)

macroIndex <- function(obj, index) {
  if(!exists(index)) {
    index <- data.frame(# contents of the default data frame here )
  }
  a <- c()
  b <- names(obj)
  for (i in 2:length(obj)) {
      obj[i] <- obj[i] * index[which(index==b[i]), 2]
  }
  return(obj)
}
PaulHurleyuk
That was one of the options I was considering. What I'm trying to do is to make the function take in as little arguments as possible, as to be portable to people who are less knowledgeable with R. I don't see that as a big obstacle at the moment, but will keep it in mind.
Roman Luštrik