views:

101

answers:

3

Dear Stackers,

I have the following data read into R as a data frame named "data_old":

   yes year month
1  15 2004     5
2   9 2005     6
3  15 2006     3
4  12 2004     5
5  14 2005     1
6  15 2006     7
.   .  ...     .
.   .  ...     .

I have written a small loop which goes through the data and sums up the yes variable for each month/year combination:

year_f <- c(2004:2006)
month_f <- c(1:12)

for (i in year_f){
    for (j in month_f){
        x <- subset(data_old, month == j & year == i, select="yes")
        if (nrow(x) > 0){
            print(sum(x))
            }
        else{print("Nothing")}
        }
    }

My question is this: I can print the sum for each month/year combination in the terminal, but how do i store it in a vector? (the nested loop is giving me headaches trying to figure this out).

Thomas

+6  A: 

Forget the loops, you want to use an aggregation function. There's a recent discussion of them in this SO question.

with(data_old, tapply(yes, list(year, month), sum))

is one of many solutions.

Also, you don't need to use c() when you aren't concatenating anything. Plain 1:12 is fine.

Richie Cotton
+7  A: 

Another way,

library(plyr)
ddply(data_old,.(year,month),function(x) sum(x[1]))

  year month V1
1 2004     5 27
2 2005     1 14
3 2005     6  9
4 2006     3 15
5 2006     7 15
gd047
or `ddply(data_old,.(year,month),summarize, yes = sum(yes))`
JoFrhwld
Cheers, worked beautifully!
Thomas Jensen
+3  A: 

Just to add a third option:

aggregate(yes ~ year + month, FUN=sum, data=data_old)
rcs
IMO, this is the way it should be done. It's clearer to the average programmer: we are aggregating, not "ddplying".
Vince
It's a good way to solve this problem, but my brain doesn't fit all of the options for other problems. The nice thing about plyr is that I just remember the pattern: split/operation/merge. ddply is the right function if I'm splitting a data.frame based on some columns and building a new data.frame on the results of an operation on the pieces.
Harlan