tags:

views:

65

answers:

3

I use R and I have a long numeric vector. I would like to look for all the maximal continuous subranges in this vector, where all values are lower then some threshold.

For example, if the given vector is

5 5 6 6 7 5 4 4 4 3 2 1 1 1 2 3 4 5 6 7 6 5 4 3 2 2 3 4 4

and my threshold is 4 (i.e., =<3), then the values that meet this condition are marked with x:

0 0 0 0 0 0 0 0 0 x x x x x x x 0 0 0 0 0 0 0 x x x x 0 0

I would also like to return something like (10,16), (24,27). How do I do that?

+1  A: 

The answer to your first question is pretty straight forward:

x <- c(5,5,6,6,7,5,4,4,4,3,2,1,1,1,2,3,4,5,6,7,6,5,4,3,2,2,3,4,4)
y <- x<=3

y
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[25]  TRUE  TRUE  TRUE FALSE FALSE

as.numeric(y)
[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0

to get the indices as you want them is more difficult.
You can try which as proposed by whatnick.
Another possibility is to use match. It returns the first element that matches. So match(1,y) would return 10. match(0,y[10:length(y)]) - 1 would return 16. If you can put this into a while-loop you could get the indices as you like.

Henrik
He means that he will need to locate transtion boundaries. With your example it will be "transitions<-which(diff(1-as.numeric(y))!=0)"
whatnick
@whatnick. I finnaly got it, thanks to your comment. and sorry for mixing < and >. I changed it now. Still too early...
Henrik
A: 

The operator you need is "which". The syntax will be indices<-which(vector<=3). This will give you a list of indices where the value meets the condition. To isolate transitions you may use a diffrential of the indices. Where the differential exceeds 1 you have a transition boundary.

whatnick
This is exactly my question. given a boolean vector, how do I convert it to a set of `true` ranges?
David B
+5  A: 

To get the ranges you can use rle

First create the encoding

x <- c(5,5,6,6,7,5,4,4,4,3,2,1,1,1,2,3,4,5,6,7,6,5,4,3,2,2,3,4,4)
enc <- rle(x <= 3)

enc.endidx <- cumsum(enc$lengths) #ending indices
enc.startidx <- c(0, enc.endidx[1:(length(enc.endidx)-1)]) + 1 # starting indices

data.frame(startidx=enc.startidx[enc$values], endidx=enc.endidx[enc$values])

That should give you

  startidx endidx
1       10     16
2       24     27
Sameer
If you really want a list of the ranges, you could split the data frame out by rows i.e. If d is the dataframe at the end, you want list(d[1,], d[2,])It might be a better idea to leave it as a data frame. When you need to use the ranges you can use the apply function along the rows of this data frame.
Sameer
+1 Thank you Sameer
David B
Just a comment - `rle()` itself is implemented in terms of `which()`, and seems to copy the input vector several times. Presumably it could be made quite a bit more efficient if done with a loop or in native code. I have no empirics to back that up though.
Ken Williams