ansaurus

Question

Answer 1

+2 A:

Is it necessary that the ID be a random 10 character string? If not, why not just paste together the columns of the data frame. If the IDs must be the same length in characters, convert factors to numeric, then paste them together:

df$ID <- paste(as.numeric(df$st.num), as.numeric(df$st.name), sep = "")

Then, if you really need to have 10 character IDs, I'd generate just the n number of IDs, and rename the levels of ID with them

df$ID <- as.factor(df$ID)
n <- nlevels(df$ID)

getID <- function(n, size=10){
  out <- {}
  for(i in 1:n){
    out <- c(paste(sample(c(0:9, LETTERS, letters), size, replace=TRUE), collapse=''))
  }
  return(out)
}

newLevels <- getID(n = n)

levels(df$ID) <- newLevels

Also, as an aside, you don't need to use function(x) with ddply that way with transform(). This code would work just the same:

ddply(df, c("st.num", "st.name"), transform, household=getString())

JoFrhwld 2010-07-17 20:57:12

Answer 2

+2 A:

Try using the id function (also in plyr):

df$id <- id(df[c("st.num", "st.name")], drop = TRUE)

hadley 2010-07-18 13:05:07

Apparently I need to go back and read the plyr documentation more carefully - this is exactly what I was looking for. I evaluated this solution and JoFrhwld's on my test dataset: a data frame with 164,961 observations and 91,876 unique groups based on 3 grouping variables. I used each of these methods to assign a group ID variable 100 times.The mean elapsed time for id() was .958 (sd .0310). Mean elapsed time for pasting the grouping fields was 1.94 (sd .0946).Thanks to both!

danpelota 2010-07-20 02:45:30

ansaurus

tags:

views:

answers:

Assigning group ID with ddply

related questions