tags:

views:

626

answers:

3

In R, what would be the most efficient/simplest way to count runs of identical elements in a sequence?

For example, how to count the numbers of consecutive zeros in a sequence of non-negative integers:

c(1,0,0,0,1,0,0,0,0,0,2,0,0) should give 3,5,2.

Thanks.

+11  A: 

Use rle():

y <- rle(c(1,0,0,0,1,0,0,0,0,0,2,0,0))
y$lengths[y$values==0]
Rob Hyndman
+3  A: 

This can be done in an efficient way by using indexes of where the values change:

x <- c(1,0,0,0,1,2,1,0,0,1,1)

Find where the values change:

diffs <- x[-1L] != x[-length(x)]

Get the indexes, and then get the difference in subsequent indexes:

idx <- c(which(diffs), length(x))
diff(c(0, idx))
Shane
That's essentially what rle() is doing.
Rob Hyndman
Sorry Rob. Wrote that on my iPhone earlier, and there's no "app for that". :). Please vote for Rob's answer instead of mine!
Shane
A: 

Thanks Rob. As ranked by the minimum description length:

y <- c(0,0,1,0,0,1,1,0,1,0,0,2,2,2,0,0,0); y <- c(1,y,1);

r<-diff(which(y>0))-1;r[r>0]

r<-rle(y);r$length[r$values==0]

r<-diff((1:length(y))[y>0])-1;r[r>0]

Anything shorter?

PS: Saw Shane's answer only after posting mine. Please vote for Rob.

knot
You should also "accept" his answer when you get a chance.
Shane