ansaurus

Question

calculating sums of unique values in a log in R

Answer 1

A:

If your problem is only computational time, I bet the better idea will be to implement your algorithm as a C chunk; you may first use R to convert keys to a coherent interval of integers (as.numeric(factor(...))) and then use boolean array in C to obtain unique key number easily and very fast. Remember that neither plyr nor standard R *pplys are significantly faster than loops (providing both are used without embarrassing errors, of course).

mbq 2010-08-25 21:44:25

I think it is what I've written, or I just don't understand your comment.

mbq 2010-08-26 08:40:17

Answer 2

+2 A:

If my interpretation is right, then this should do it :

items = data.frame(ts=c(3,8,12,46,100), key=c(12,49,42,12,49), event=c(1,1,1,-1,1))

# numbers of keys that sum to zero, no ddply necessary
nzero <- cumsum(ave(items$event,items$key,FUN=cumsum)==0)

# number of unique keys at a given timepoint
nunique <- rep(F,length(items$key))
nunique[match(unique(items$key),items$key)] <- T
nunique <- cumsum(nunique)

# makes :
items$p <- (nunique-nzero)/nunique

items
   ts key event         p
1   3  12     1 1.0000000
2   8  49     1 1.0000000
3  12  42     1 1.0000000
4  46  12    -1 0.6666667
5 100  49     1 0.6666667

Joris Meys 2010-08-26 08:50:01

I like this solution, very efficient and elegant, thanks!

mkhq 2010-08-26 17:41:00

ansaurus

tags:

views:

answers:

calculating sums of unique values in a log in R

related questions