tags:

views:

94

answers:

3

I have the number of samples per unit and need to calculate statistics with R.

The table is like this (all rows and columns are actually filled with values, I only write a few here for easier visibility, and there are many more columns):

Hour     1    2    3    4

H1       72  11   98   65

H2       19  27

H3

H4

H5

:

H200000

I.e. the first hour (H1) there were 72 samples of value 1, 11 samples of value 2, etc. The second hour(H2) there were 19 samples of value 1, 27 samples of value 2, etc.

I need to calculate the mean and standard deviation per hour (i.e. per row). As there are many thousands of rows I need a fast method.

Example: The manual mean-calculation for hour 1 (H1) would be:

(72x1 + 11x2 + 98x3 + 65x4)/(72+11+98+65) = 2.6

I suppose there are R-methods or packages that can do this, but I fail to find where. Your support is highly appreciated.

Thanks, Chris

+2  A: 

You want to calculate a weighted mean, so you need weighted.mean. For the first row:

values  <- c(1, 2, 3, 4)
weights <- c(72, 11, 98, 65)
weighted.mean(values, weights)

The weighted standard deviation is not well-defined. You could use a hand-rolled weighted RMS as an estimator (but this assumes that your input sample is really from a single Gaussian, i.e. there are no outliers -- not sure if that's the case for your example).

# same values and weights as above
sqrt(sum(values^2*weights^2))/sum(weights)

You should read your data into a table and iterate over every row. Also, "many thousands of rows" is not necessarily a large number for such a simple calculation. This is very basic stuff, maybe checking out a tutorial would also be beneficial.

honk
good answer, but don't iterate, use `apply`
nico
A: 

Assuming your table is a matrix called dataset of n * 20000 and you have the weigths in a weights array you just need to do:

# The 1 as 2nd parameter indicates to apply the function on the rows
w.means <- apply(dataset, 1, weighted.mean, w=weights)
nico
+1  A: 

You are much better off (i.e. faster calculations) using matrix operations instead of applying something by row. For example, assuming X is the matrix containing your data, you can get the weighted means the following way:

w <- 1:ncol(X)
w <- w/sum(w)  #scale to have a sum of 1
wmeans <- X %*% w
Aniko