tags:

views:

400

answers:

4

How do I add the values from many variables. So if I just had two variables (columns) I could simply go

summation.variable <- variable1 + vabirable2

or if it was all in a dataframe

transform(dataframe,summation.col = column1 + column2)

How do I do it if I have about 10 variables and I do not want to type each one as in col1+col2+col3+col4. To make matters worse my columns have quite long names and at times the exact columns that I use can change. I have a character vector with all the relevant column names in it but cannot think how to use it.

The following is useless since it adds every value in every column in every row and give a single value for the whole lot.

sum(metrics)
+4  A: 

You want to use rowSums (see the indexing with a character vector.)

tmp <- data.frame(a=1:2,b=3:4,d=5:6)
rowSums(tmp[,c("a","d")])

or, more generally, apply:

apply(tmp[,c("a","d")], 1, sum)
Eduardo Leoni
+1  A: 

I just got the answer. I knew I want some sort of sum. I went to the R help to look up "sum". And there I found it. The answer is to follow the link "colSums" to "rowSums". So where metrics is a character vector of the relevant column names. The following line produces a vector where all the numbers are added across each row.

rowSums(data.frame[metrics])

How would one do it if one wanted every value multiplied to each other? I do not see a rowProducts.

Farrel
I think I'd use the apply function to do products (or some other function), check ?apply
Kiar
rowSums is a more efficient version of apply for summations
Thierry
+4  A: 

There are many ways to do this kind of operation (ie. apply a function across a row or column), but as Eduardo points out, apply is the most basic:

tmp <- data.frame(a=1:2,b=3:4,d=5:6)
apply(tmp, 1, prod)

This is a very flexible function. For instance, you can do both operations at once with this call:

apply(tmp, MARGIN=1, function(x) c(sum(x), prod(x)))

Performing the same analysis across columns is also simple (the MARGIN parameter describes whether you use rows or columns):

apply(tmp, MARGIN=2, function(x) c(sum(x), prod(x)))
Shane
+3  A: 

Answering to Farrel answer:

On RSeek for rowProd I found two packages - matrixStats and fUtilities. You could look on them.

Second solution is bit tricky. You can create you expression and evaluate them.

X <- structure(list(
    varA = c(0.98, 0.75, -0.56, -1.43, 0.65, -1.15, -1.52, 0.1, 0.06, 0.76),
    varB = c(-0.12, -0.6, 0.62, 0.9, -0.44, 0.37, 0.62, 0.76, -1.61, -0.26),
    varC = c(-0.5, -0.37, -0.43, -0.7, 0.83, -0.24, -0.57, 0.05, -1.31, 0.7),
    varD = c(-0.06, -0.11, 1.03, -1.76, -0.42, -1.21, -0.62, -1, -1.16, 2.13),
    varE = c(-1.96, 0.69, -1.85, -1.74, -1.47, 1.24, 0.29, -1.18, 0.89, 0.42),
    varF = c(0.29, -0.22, -1.29, 1.19, 0.38, -0.23, -0.5, -1.07, -1.83, 0.58),
    varG = c(0.59, -0.41, -1.37, 0.89, -0.75, 0.95, 0.95, -0.9, 0.71, -1.3)
  ),
  .Names = c("varA", "varB", "varC", "varD", "varE", "varF", "varG"),
  row.names = c(NA, -10L), class = "data.frame"
)

metrics <- c("varB","varC","varF")

eval(
  parse( text = paste(metrics,collapse=" * ") ),
  envir = X
)

Some explanations:

  • paste create a string looks like varB * varC * varF (collapse is for concatenating elements of vector)
  • parse is to convert text to expression
  • eval with envir=X is to execute expression within X

For your original question you could use collapse="+".

edit: if your variables aren't in a data.frame then eval without envir is enough.

edit2: examples of using rowProds from mentioned packages:

matrixStats::rowProds(as.matrix(X[,metrics])) # convert to a matrix is needed
fUtilities::rowProds(X[,metrics]) # without conversion

I digg in source this functions and:

  • fUtilities use apply, so this is the same as apply(X,1,prod) (this is not efficient soulution)
  • matrixStats is smart and do something like exp(rowSums(log(X))), so should be faster.

Speed tests:

Xm <- matrix(rnorm(50000*8),ncol=8)
Xd <- as.data.frame(Xm)

require(fUtilities)
require(matrixStats)
system.time( matrixStats::rowProds(as.matrix(Xd)) ) 
#   user  system elapsed 
#   0.08    0.02    0.09 
system.time( matrixStats::rowProds(Xm) )
#   user  system elapsed 
#   0.08    0.00    0.08 
system.time( fUtilities::rowProds(Xd) )
#   user  system elapsed 
#   0.52    0.00    0.52

Even with conversion to a matrix matrixStats version is faster.

Marek
library(fortunes);fortune(106)
Thierry
I want to use do.call(f,as.list(X[,metrics])) but I can't find a function working like f(a,b,c) = a*b*c. Good comment btw ;)
Marek
Look at `prod()`
hadley
prod() do a[1]*a[2]*...*a[n]*b[1]*b[2]*...*b[n]*c[1]*...*c[n] so it is not what I need.
Marek
@Shane: I disagree. In help to rowProds there is nothing about time series, and in example there is matrix (and function working for data.frame too).
Marek
Marek: Thanks. You're correct: it expects a matrix as input.
Shane