tags:

views:

72

answers:

2

Before I start I'd like to say that this may not be the appropriate medium for such a question, so if I am out of place, please tell me.

I do statistical work for an off shoot of a land grant University and have been presented with a problem that I think may be too complicated to solve. Hopefully you can validate me.

We are working on a social capital project so our data set has a list of an individual's organizational memberships. So each person gets a numeric ID and then a sub ID for each group they are in. The unit of analysis, therefore, is the group they are in. One of our variables is a three point scale for the type of group it is. Sounds simple enough?

We want to bring the unit of analysis to the individual level and condense the type of group it is into a variable signifying how many different types of groups they are in.

For instance, person one is in eight groups. Of those groups, three are (1s), three are (2s), and two are (3s). What the individual level variable would look like, ideally, is 3, because she is in all three types of groups.

Is this possible in the least?

+2  A: 

I think what you're asking is whether it is possible to count the number of unique types of group to which an individual belongs.

If so, then that is certainly possible.

I wouldn't be able to tell you how to do it in R since I don't know a lot of R, and I don't know what your data looks like. But there's no reason why it wouldn't be possible.

Is this data coming from a database? If so, then it might be easier to write a SQL query to compute the value you want, rather than to do it in R. If you describe your schema, there should be lots of people here who could give you the query you need.

jbourque
+2  A: 
##simulate data
##individuals
n <- 10
## groups
g <- 5
## group types
gt <- 3
## individuals*group membership
N <- 20
## inidividuals data frame
di <- data.frame(individual=sample(1:n,N,replace=TRUE),
                 group=sample(1:g,N, replace=TRUE))
## groups data frame
dg <- data.frame(group=1:g, type=sample(1:gt,g,replace=TRUE))
## merge
dm <- merge(di,dg)
## order - not necessary, but nice
dm <- dm[order(dm$individual),]
## group type per individual
library(plyr)
dr <- ddply(dm, "individual", function(x) length(unique(x$type)))

> head(dm)
   group individual type
2      2          1    2
8      2          1    2
20     5          1    1
9      3          3    2
12     3          3    2
17     4          3    2

> head(dr)
  individual V1
1          1  2
2          3  1
3          4  2
4          5  1
5          6  1
6          7  1
Eduardo Leoni