tags:

views:

1159

answers:

5

Let's say I have two columns of data. The first contains categories such as "First", "Second", "Third", etc. The second has numbers which represent the number of times I saw "First".

For example:

Category     Frequency
First        10
First        15
First        5
Second       2
Third        14
Third        20
Second       3

I want to sort the data by Category and add up the Frequencies:

Category     Frequency
First        30
Second       5
Third        34

How would I do this in R? I looked up the sort and order functions, but I don't know how to sum the Frequencies with the Categories.

+4  A: 

If x is a dataframe with your data, then the following will do what you want:

require(reshape)
recast(x, Category ~ ., fun.aggregate=sum)
Rob Hyndman
+4  A: 
library(plyr)
ddply(tbl, .(Category), summarise, sum = sum(Frequency))
learnr
+2  A: 

Just to add a third option:

require(doBy)
summaryBy(Frequency~Category, data=yourdataframe, FUN=sum)
dalloliogm
+7  A: 

Using aggregate:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                  "Third", "Third", "Second")), 
                Frequency=c(10,15,5,2,14,20,3))
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34
rcs
this answer is the only one which doesn't make use of any external library; however, I prefer to use doBy at least, which allows to group by more than one function, and has a fancier syntax.
dalloliogm
+3  A: 

This is somewhat related to this question.

You can also just use the by() function:

x2 <- by(x$Frequency, x$Category, sum)
do.call(rbind,as.list(x2))

Those other packages (plyr, reshape) have the benefit of returning a data.frame, but it's worth being familiar with by() since it's a base function.

Shane