views:

127

answers:

2

I have a set a data that I need to calculate their "consecutive mean" (I dunno if it is the correct name, but I can't find anything better), here is an example:

ID  Var2 Var3    
1    A    1
2    A    3
3    A    5
4    A    7
5    A    9
6    A    11
7    B    2
8    B    4
9    B    6
10   B    8
11   B    10

Here I need to calculated the mean of 3 Var3 variable in the same subset consecutively (i.e. there will be 4 means caulculated for A: mean(1,3,5), mean(3,5,7), mean(5,7,9), mean(7,9,11), and 3 means calculated for B: mean(2,4,6), mean(4,6,8), mean(6,8,10). And the result should be:

ID  Var2 Var3 Mean
1    A    1   N/A
2    A    3   N/A
3    A    5   3
4    A    7   5
5    A    9   7
6    A    11  9
7    B    2   N/A
8    B    4   N/A
9    B    6   4
10   B    8   6
11   B    10  8

Currently I am using a "loop-inside-a-loop" approach, I subset the dataset using Var2, and then I calculated the mean in another start from the third data.

It suits what I need, but it is very slow, is there any faster way for this problem?

Thanks!

+3  A: 

It's generally referred to as a "rolling mean" or "running mean". The plyr package allows you to calculate a function over segments of your data and the zoo package has methods for rolling calculations.

> lines <- "ID,Var2,Var3    
+ 1,A,1
+ 2,A,3
+ 3,A,5
+ 4,A,7
+ 5,A,9
+ 6,A,11
+ 7,B,2
+ 8,B,4
+ 9,B,6
+ 10,B,8
+ 11,B,10"
> 
> x <- read.csv(con <- textConnection(lines))
> close(con)
> 
> ddply(x,"Var2",function(y) data.frame(y,
+   mean=rollmean(y$Var3,3,na.pad=TRUE,align="right")))
   ID Var2 Var3 mean
1   1    A    1   NA
2   2    A    3   NA
3   3    A    5    3
4   4    A    7    5
5   5    A    9    7
6   6    A   11    9
7   7    B    2   NA
8   8    B    4   NA
9   9    B    6    4
10 10    B    8    6
11 11    B   10    8
Joshua Ulrich
thanks! but what if var3 is not ordered (they should be ordered by ID)?
lokheart
Then order `x` by `ID` first: `x <- x[order(x$ID),]`
Joshua Ulrich
+1  A: 

Alternately using base R

x$mean <- unlist(tapply(x$Var3, x$Var2, zoo::rollmean, k=3, na.pad=TRUE, align="right", simplity=FALSE))
Aaron Statham