tags:

views:

73

answers:

1

I have a dataset that looks like so:

            x             y
1       0.0000  0.4459183993
2     125.1128  0.4068805502
3     250.2257  0.3678521348
4     375.3385  0.3294434397
5     500.4513  0.2922601919
6     625.5642  0.2566381551
7     750.6770  0.2229130927
8     875.7898  0.1914207684
9    1000.9026  0.1624969456
10   1126.0155  0.1364773879
11   1251.1283  0.1136978589
12   1376.2411  0.0944717371
13   1501.3540  0.0786550515
14   1626.4668  0.0656763159
15   1751.5796  0.0549476349
16   1876.6925  0.0458811131
17   2001.8053  0.0378895151
18   2126.9181  0.0304416321
19   2252.0309  0.0231041362
20   2377.1438  0.0154535572
21   2502.2566  0.0070928195
22   2627.3694 -0.0020708606
23   2752.4823 -0.0119351534
24   2877.5951 -0.0223944877
25   3002.7079 -0.0332811155
26   3127.8208 -0.0442410358
27   3252.9336 -0.0548855203
...

Full data available here.

It's easier to see visually by plotting x and y with a zero intercept line:

ggplot(dat,aes(x,y)) + geom_line() + geom_hline(yintercept=0)

You can see the plot here (if you don't want to download the data and plot it yourself.)

I want to pick out 'patches' defined as the distance along x from when the line goes above zero on the y till it goes below zero. This will always happen at least once (since the line starts above zero), but can happen many times.

Picking out the first patch is easy.

patch1=dat[min(which(dat$y<=0.000001)),]

But how would I loop through and pick up subsequent patches?

+3  A: 

Here's a complete working solution:

# sample data
df <- data.frame(x=1:10, y=rnorm(10))
# find positive changes in "y"
idx <- which(c(FALSE, diff(df$y > 0) == 1))
# get the change in "x"
patches <- diff(c(0, df[idx, "x"]))
Shane
Sorry...haven't tested this. Also, added an additional clause to the `idx` in order to only pick out positive values. In retrospect, you could do this in one step and just check for the difference across y > 0 (rather than using the change in sign).
Shane
I want the difference in the x values. So if it goes positive at x=1000 and comes back down below zero at x=1400, I want patch2=400. but just the entire row for each would be fine because I can process those later.
Maiasaura
Updated with complete solution. You can turn this into a function or put it all on one line (no need to assign `idx`).
Shane
Thanks very much Shane! much appreciated.
Maiasaura
You also don't need the `which()` function, it works fine with the booleans.
Ken Williams
@Ken: Good point. That was a hold-over from an earlier solution where I was taking differences.
Shane