tags:

views:

124

answers:

2

basically i want to perform diagonal averaging in R. Below is some code adapted from the simsalabim package to do the diagonal averaging. Only this is slow.

Any suggestions for vectorizing this instead of using sapply?

reconSSA <- function(S,v,group=1){
### S : matrix
### v : vector

    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])

    ##Diagonal Averaging
    .intFun <- function(i,x,ind) mean(x[ind==i])

    RC <- sapply(1:N,.intFun,x=XX,ind=IND)
    return(RC)
}

For data you could use the following

data(AirPassengers)
v <- AirPassengers
L <- 30
T <- length(v)
K <- T-L+1

x.b <- matrix(nrow=L,ncol=K)
x.b <- matrix(v[row(x.b)+col(x.b)-1],nrow=L,ncol=K)
S <- eigen(x.b %*% t(x.b))[["vectors"]] 
out <- reconSSA(S, v, 1:10)
A: 

I can't get your example to produce sensible results. I think there are several errors in your function.

  1. XX is used in sapply, but is not defined in the function
  2. sapply works over 1:N, where N=144 in your example, but x.b only has 115 columns
  3. reconSSA simply returns x

Regardless, I think you want:

data(AirPassengers)
x <- AirPassengers
rowMeans(embed(x,30))

UPDATE: I've re-worked and profiled the function. Most of the time is spent in mean, so it may be hard to get this much faster using R code. That said, you can 20% speedup by using sum instead.

reconSSA <- function(S,v,group=1){

    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])

    ##Diagonal Averaging
    .intFun <- function(i,x,ind) {
        I <- ind==i
        sum(x[I])/sum(I)
    }

    RC <- sapply(1:N,.intFun,x=XX,ind=IND)
    return(RC)
}
Joshua Ulrich
That's not quite what I am looking for. The idea is to use the structure of x.b (Hankel) and average the anti-diagonals, since we will seek an approximation of x.b , which will likely not have the proper structure(Hankel), so using diagonal averaging alleviates this problem to some extent. This falls under the topic of singular spectrum analysis. I also fixed the referencing you mentioned.
pslice
I'll take another shot at it if you can explain what you expect the call to `sapply` to do. Your intent is not clear from the code.
Joshua Ulrich
What I am expecting it to do is on the bottom of page 3. See the link for a PDF. http://bit.ly/ati1ll
pslice
thanks again for your help. this has been extremely helpful.
pslice
+2  A: 

You can speed up the computation by almost 10 times with the help of a very specialized trick with rowsum:

reconSSA_1 <- function(S,v,group=1){
### S : matrix
### v : vector
    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- row(XX)+col(XX)-1
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- S[,group] %*% t(t(XX) %*% S[,group])
    ##Diagonal Averaging
    SUMS <- rowsum.default(c(XX), c(IND))
    counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
    else c(1:K, rep(K, L-K-1), K:1)
    c(SUMS/counts)
}

all.equal(reconSSA(S, v, 1:10), reconSSA_1(S, v, 1:10))
[1] TRUE

library(rbenchmark)

benchmark(SSA = reconSSA(S, v, 1:10),
          SSA_1 = reconSSA_1(S, v, 1:10),
          columns = c( "test", "elapsed", "relative"),
          order = "relative")


    test elapsed relative
2 SSA_1    0.23   1.0000
1   SSA    2.08   9.0435

[Update: As Joshua suggested it could be speed up even further by using the crux of the rowsum code:

reconSSA_2 <- function(S,v,group=1){
### S : matrix
### v : vector
    N <- length(v)
    L <- nrow(S)
    K <- N-L+1
    XX <- matrix(0,nrow=L,ncol=K)
    IND <- c(row(XX)+col(XX)-1L)
    XX <- matrix(v[row(XX)+col(XX)-1],nrow=L,ncol=K)
    XX <- c(S[,group] %*% t(t(XX) %*% S[,group]))
    ##Diagonal Averaging
    SUMS <- .Call("Rrowsum_matrix", XX, 1L, IND, 1:N, 
                  TRUE, PACKAGE = "base")
    counts <- if(L <= K) c(1:L, rep(L, K-L-1), L:1)
    else c(1:K, rep(K, L-K-1), K:1)
    c(SUMS/counts)
}

   test elapsed  relative
3 SSA_2   0.156  1.000000
2 SSA_1   0.559  3.583333
1   SSA   5.389 34.544872

A speedup of x34.5 comparing to original code!!

]

VitoshKa
Very nice vectorization with `rowsums`!
Joshua Ulrich
wow. that's great. i hadn't quite thought of it that way.
pslice
You could make it even faster by only using the parts of `rowsums` that you need: (i.e. the `sort(unique(...))` and the `.Call("Rrowsum_matrix", ...)`.
Joshua Ulrich
:) rowsum was written for speed, I wish there were more funcs like this, specialized for grouped operations on arrays.
VitoshKa
@Joshua I remember experimenting with that, a half a year or so ago. It indeed speeds the stuff another 4 times, as far as I could remember. But one small error in inputs and your R session is gone:). Reorder might not be necessary for this example. So it might be quite worth rewriting the rowsum a bit.
VitoshKa
It turned out that for this particular problem, unique was not necessary and the use of integer groups speed up the whole computation quite considerably.
VitoshKa