tags:

views:

107

answers:

5

Hello

I have a time series (zoo with index type chron) and I need to calculate the
cummax(mydata)-mydata on every day separately. Getting a zoo timeseries again.

I've tried this aggregate(mydata, as.date, cummax) but aggregate can only produce a single scalar result for each subset instead of a vector.

I've read that maybe with tapply, lapply, plyr, cut, rollapply.... but I couldn't gett it .

Can anyone help me, please

cheers

+1  A: 

zoo has a cummax method, so you shouldn't have any issues getting a zoo result. Perhaps you're making this more difficult than it is... is this what you want?

> set.seed(21)
> z <- zoo(runif(10),as.chron(Sys.Date()-10:1))
> merge(z,cummax=cummax(z),diff=cummax(z)-z)
                  z    cummax      diff
08/09/10 0.66754012 0.6675401 0.0000000
08/10/10 0.93521022 0.9352102 0.0000000
08/11/10 0.05818433 0.9352102 0.8770259
08/12/10 0.61861583 0.9352102 0.3165944
08/13/10 0.17491846 0.9352102 0.7602918
08/14/10 0.03767539 0.9352102 0.8975348
08/15/10 0.52531317 0.9352102 0.4098971
08/16/10 0.28218425 0.9352102 0.6530260
08/17/10 0.49904520 0.9352102 0.4361650
08/18/10 0.63382510 0.9352102 0.3013851

Since that's pretty easy, I'm guessing your time-series is an intraday frequency. If that's the case, the code is more involved, but this should do the trick:

> require(xts)  # for the endpoints() function
> set.seed(21)
> z <- zoo(runif(10),as.chron(Sys.Date()-seq(0.5,3,length.out=10)))
> ep <- endpoints(z,"days")
> Z <- lapply(1:(length(ep)-1), function(x) cummax(z[(ep[x]+1):ep[x+1]]))
> Z <- do.call(rbind, Z)
> merge(z,Z,Z-z)
                            z         Z     Z - z
(08/16/10 00:00:00) 0.8493961 0.8493961 0.0000000
(08/16/10 06:40:00) 0.9860037 0.9860037 0.0000000
(08/16/10 13:20:00) 0.1721917 0.9860037 0.8138120
(08/16/10 20:00:00) 0.1018046 0.9860037 0.8841991
(08/17/10 02:40:00) 0.9186834 0.9186834 0.0000000
(08/17/10 09:20:00) 0.9596138 0.9596138 0.0000000
(08/17/10 16:00:00) 0.1844608 0.9596138 0.7751531
(08/17/10 22:40:00) 0.6992523 0.9596138 0.2603615
(08/18/10 05:20:00) 0.2524456 0.2524456 0.0000000
(08/18/10 12:00:00) 0.7861149 0.7861149 0.0000000
Joshua Ulrich
Hello I will study your answer, thanks. meanwhile I got to do this tapply(z,as.Date(index(z)), cummax) but it gives me a list and when I try to convert it to a zoo object, with unlist or unsplit, I get strange things with dates. cheers
That's simpler than my answer. You're just missing the `do.call` part : `Z <- do.call(rbind, tapply(z, as.Date(index(z)), cummax))`
Joshua Ulrich
A: 

thank you.

Now I'll check if it's really faster than using a loop through the time index, defining myself cummax.

What if I need the first data of each day instead of the result of cummax? I mean a vector of the same size (aggregate gives me a vector with one element for each day instead)If I use head tapply complains because it needs a vector
A: 

A crude method: make a matrix with rows or columns corresponding to the chunks over which the calculation is to be done, then use apply

x <- rnorm(240) # imagine this to be 10 days of hourly data
xm <- matrix(x, ncol=24, byrow=TRUE)
daily.avg <- apply(xm, 1, mean)
plot(x)
lines(12 + seq(1,240,24), daily.avg)
dan
A: 

Hello dan, the problem with the method you suggest is that the some data may be missing, the chunks won't be of the same size.

A: 

Its can be done in one line using ave:

> library(zoo)
> set.seed(123)
> z <- zoo(rnorm(10), chron(0:9/5))
>
> ave(coredata(z), as.Date(time(z)), FUN = cummax) - z
(01/01/70 00:00:00) (01/01/70 04:48:00) (01/01/70 09:36:00) (01/01/70 14:24:00) (01/01/70 19:12:00) (01/02/70 00:00:00) (01/02/70 04:48:00) 
           0.000000            0.000000            0.000000            1.488200            1.429421            0.000000            1.254149 
(01/02/70 09:36:00) (01/02/70 14:24:00) (01/02/70 19:12:00) 
           2.980126            2.401918            2.160727 
G. Grothendieck