





I have a time series (zoo with index type chron) and I need to calculate the
cummax(mydata)-mydata on every day separately. Getting a zoo timeseries again.

I've tried this aggregate(mydata, as.date, cummax) but aggregate can only produce a single scalar result for each subset instead of a vector.

I've read that maybe with tapply, lapply, plyr, cut, rollapply.... but I couldn't gett it .

Can anyone help me, please


+1  A: 

zoo has a cummax method, so you shouldn't have any issues getting a zoo result. Perhaps you're making this more difficult than it is... is this what you want?

> set.seed(21)
> z <- zoo(runif(10),as.chron(Sys.Date()-10:1))
> merge(z,cummax=cummax(z),diff=cummax(z)-z)
                  z    cummax      diff
08/09/10 0.66754012 0.6675401 0.0000000
08/10/10 0.93521022 0.9352102 0.0000000
08/11/10 0.05818433 0.9352102 0.8770259
08/12/10 0.61861583 0.9352102 0.3165944
08/13/10 0.17491846 0.9352102 0.7602918
08/14/10 0.03767539 0.9352102 0.8975348
08/15/10 0.52531317 0.9352102 0.4098971
08/16/10 0.28218425 0.9352102 0.6530260
08/17/10 0.49904520 0.9352102 0.4361650
08/18/10 0.63382510 0.9352102 0.3013851

Since that's pretty easy, I'm guessing your time-series is an intraday frequency. If that's the case, the code is more involved, but this should do the trick:

> require(xts)  # for the endpoints() function
> set.seed(21)
> z <- zoo(runif(10),as.chron(Sys.Date()-seq(0.5,3,length.out=10)))
> ep <- endpoints(z,"days")
> Z <- lapply(1:(length(ep)-1), function(x) cummax(z[(ep[x]+1):ep[x+1]]))
> Z <- do.call(rbind, Z)
> merge(z,Z,Z-z)
                            z         Z     Z - z
(08/16/10 00:00:00) 0.8493961 0.8493961 0.0000000
(08/16/10 06:40:00) 0.9860037 0.9860037 0.0000000
(08/16/10 13:20:00) 0.1721917 0.9860037 0.8138120
(08/16/10 20:00:00) 0.1018046 0.9860037 0.8841991
(08/17/10 02:40:00) 0.9186834 0.9186834 0.0000000
(08/17/10 09:20:00) 0.9596138 0.9596138 0.0000000
(08/17/10 16:00:00) 0.1844608 0.9596138 0.7751531
(08/17/10 22:40:00) 0.6992523 0.9596138 0.2603615
(08/18/10 05:20:00) 0.2524456 0.2524456 0.0000000
(08/18/10 12:00:00) 0.7861149 0.7861149 0.0000000
Joshua Ulrich
Hello I will study your answer, thanks. meanwhile I got to do this tapply(z,as.Date(index(z)), cummax) but it gives me a list and when I try to convert it to a zoo object, with unlist or unsplit, I get strange things with dates. cheers
That's simpler than my answer. You're just missing the `do.call` part : `Z <- do.call(rbind, tapply(z, as.Date(index(z)), cummax))`
Joshua Ulrich

thank you.

Now I'll check if it's really faster than using a loop through the time index, defining myself cummax.

What if I need the first data of each day instead of the result of cummax? I mean a vector of the same size (aggregate gives me a vector with one element for each day instead)If I use head tapply complains because it needs a vector

A crude method: make a matrix with rows or columns corresponding to the chunks over which the calculation is to be done, then use apply

x <- rnorm(240) # imagine this to be 10 days of hourly data
xm <- matrix(x, ncol=24, byrow=TRUE)
daily.avg <- apply(xm, 1, mean)
lines(12 + seq(1,240,24), daily.avg)

Hello dan, the problem with the method you suggest is that the some data may be missing, the chunks won't be of the same size.


Its can be done in one line using ave:

> library(zoo)
> set.seed(123)
> z <- zoo(rnorm(10), chron(0:9/5))
> ave(coredata(z), as.Date(time(z)), FUN = cummax) - z
(01/01/70 00:00:00) (01/01/70 04:48:00) (01/01/70 09:36:00) (01/01/70 14:24:00) (01/01/70 19:12:00) (01/02/70 00:00:00) (01/02/70 04:48:00) 
           0.000000            0.000000            0.000000            1.488200            1.429421            0.000000            1.254149 
(01/02/70 09:36:00) (01/02/70 14:24:00) (01/02/70 19:12:00) 
           2.980126            2.401918            2.160727 
G. Grothendieck