tags:

views:

203

answers:

2

Hi,

I am trying to do a "group by" - style weighted mean in R. With some basic mean the following code (using the plyr package from Hadley) worked well.

ddply(mydf,.(period),mean)

If I use the same approach with weighted.mean i get the following error "'x' and 'w' must have the same length" , which I do not understand because the weighted.mean part works outside ddply.

weighted.mean(mydf$mycol,mydf$myweight) # works just fine
ddply(mydf,.(period),weighted.mean,mydf$mycol,mydf$myweight) # returns the erros described above
ddply(mydf,.(period),weighted.mean(mydf$mycol,mydf$myweight)) # different code same story

I thought of writing a custom function instead of using weighted.mean and then passing it to ddply or even writing something new from scratch with subset. In my case it would be too much work hopefully, but there should by a smarter solution with what´s already there.

thx for any suggestions in advance!

+5  A: 

Use an anonymous function:

> ddply(iris,"Species",function(X) data.frame(wmn=weighted.mean(X$Sepal.Length,
+                                                               X$Petal.Length),
+                                             mn=mean(X$Sepal.Length)))
     Species      wmn    mn
1     setosa 5.016963 5.006
2 versicolor 5.978075 5.936
3  virginica 6.641535 6.588
> 

This computes a weighted mean of Sepal.Length (weighted by Petal.Length) as well as unweighted mean and returns both.

Dirk Eddelbuettel
This is nice. Haven´t had much to do with anonymous functions so far. seems really worth a look. I don´t get the syntax / idea fully yet, but I will look into it, thx for your help! Do you need to print everything in one line because of no "{}" in there ? Where can I learn something about anonymous functions?
ran2
Well, *all* these these `*apply`, `by`, ... functions use anonymous functions so you should find plenty of examples. Curly braces are needed once you group more than one command. Lastly, you do not have use an anonymous function -- you can also define your own -- but using them saves on typing :)
Dirk Eddelbuettel
what about `lapply(split(iris, species), weighted.mean)` or smth like that?
aL3xa
+3  A: 

Use summarise (or summarize):

ddply(iris, "Species", summarise, 
  wmn = weighted.mean(Sepal.Length, Petal.Length),
  mn = mean(Sepal.Length))
hadley