ansaurus

Question

Using values associated with unique values from a data frame

Answer 1

+3 A:

Not entirely following your question, but I think this is what you want:

df <- data.frame(read.table(textConnection("
A 2
A 7
B 1
B 3
B 6
C 2")))
library(plyr)
ddply(df, .(V1), nrow)

There are numerous ways to do this kind of thing, so you will need to provide more detail about what you're trying to do if you want a better answer.

Edit

In general, if you have a set of unique values and you want to apply a function to them based on that set, then you can do this with some version of an apply function. For example, in the example above, here are a few different ways to get the average value based on the first column:

ddply(df, .(V1), function(x) data.frame(mean=mean(x[,2])))
do.call("rbind", by(df, df[,1], function(x) data.frame(mean=mean(x[,2]))))
do.call("rbind", lapply(unique(df[,1]), function(a) data.frame(V1=a, mean=mean(df[df[,1]==a,2]))))

Shane 2010-10-15 19:26:27

ABW 2010-10-15 19:46:57

Yes, just replace `nrow` with any function you want, and it will do what you describe.

Shane 2010-10-15 19:54:18

@Shane: The do.call reflex again? ;-)

Joris Meys 2010-10-15 23:55:13

@Joris Without using a package, any better suggestions for how to convert a list into a matrix?

Shane 2010-10-16 00:33:32

@Shane: cbind(by(df, df[,1], function(x) mean(x[,2]))) (gives the same as the first option), or aggregate(df[,2],list(df[,1]),FUN=mean) (for the second option)

Joris Meys 2010-10-16 11:39:30

Answer 2

+2 A:

The ave() function or tapply functions will do what you want. It depends one what you want for output. If you want the output vector to be as long as the input vector ave(), but if you want to reduce the data to the levels of your grouping vector tapply().

ave(mydata[,2], mydata[,1], FUN = length) #FUN can be any function

Or, for the reduced version...

tapply(mydata[,2], mydata[,1], FUN = length) #FUN can be any function

John 2010-10-15 21:16:37

Answer 3

A:

Another possibility, using the df of Shane:

aggregate(df[,2],list(df[,1]),FUN=length)

again, replace length by any other function that works on vectors. You can specify more than one factor in the list, then it will do so for every factor combination.

The difference with ave() is that ave() gives a vector with the length of the original dataframe. aggregate() returns a data frame where one variable is the group indicator. tapply() returns a vector with the length equal to the number of groups. ddply() returns a data frame with a variable for every specified factor.

The by() construct is especially useful if you have to do operations on multiple columns, as it is basically a loop over data frames. It returns a list, that can be converted using Shanes construct, or by using matrix() or rbind() directly. This gives every time a somewhat different structure, but all of them are useful.

Depending on the format you want your output, you can choose one of these possibilities.

Joris Meys 2010-10-16 11:37:43

ansaurus

tags:

views:

answers:

Using values associated with unique values from a data frame

related questions