ansaurus

Question

Applying a function to a distance matrix in R.

Answer 1

+2 A:

My solution is to get the indexes of the distance vector, given a row and the size of the matrix. I got this from codeguru

int Trag_noeq(int row, int col, int N)
{
   //assert(row != col);    //You can add this in if you like
   if (row<col)
      return row*(N-1) - (row-1)*((row-1) + 1)/2 + col - row - 1;
   else if (col<row)
      return col*(N-1) - (col-1)*((col-1) + 1)/2 + row - col - 1;
   else
      return -1;
}

After translating to R, assuming indexes start at 1, and assuming a lower tri instead of upper tri matrix I got.
EDIT: Using the vectorized version contributed by rcs

noeq.1 <- function(i, j, N) {
    i <- i-1
    j <- j-1
    ix <- ifelse(i < j,
                 i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i,
                 j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j) * ifelse(i == j, 0, 1)
    ix
}

## To get the indexes of the row, the following one liner works:

getrow <- function(z, N) noeq.1(z, 1:N, N)

## to get the row sums

getsum <- function(d, f=sum) {
    N <- attr(d, "Size")
    sapply(1:N, function(i) {
        if (i%%100==0) print(i)
        f(d[getrow(i,N)])
    })
}

So, with the example:

sumd2 <- getsum(d)

This was much slower than as.matrix for small matrices before vectorizing. But just about 3x as slow after vectorizing. In a Intel Core2Duo 2ghz applying the sum by row of the size 10000 matrix took just over 100s. The as.matrix method fails. Thanks rcs!

Eduardo Leoni 2009-11-07 07:41:24

Answer 2

+2 A:

This is a vectorized version of the function noeq (either argument i or j):

noeq.1 <- function(i, j, N) {
    i <- i-1
    j <- j-1
    ifelse(i < j,
           i*(N-1) - ((i-1)*i)/2 + j - i,
           j*(N-1) - ((j-1)*j)/2 + i - j) * ifelse(i == j, 0, 1)
}   

> N <- 4
> sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N)))
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    3
[2,]    1    0    4    5
[3,]    2    4    0    6
[4,]    3    5    6    0
> sapply(1:N, function(i) noeq.1(i, 1:N, N))
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    3
[2,]    1    0    4    5
[3,]    2    4    0    6
[4,]    3    5    6    0

Timings are done on a 2.4 GHz Intel Core 2 Duo (Mac OS 10.6.1):

> N <- 1000
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
   user  system elapsed 
  0.676   0.061   0.738 
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
   user  system elapsed 
 14.359   0.032  14.410

rcs 2009-11-07 11:48:54

Good example of how R can be fast: 20x improvement!

Eduardo Leoni 2009-11-07 15:59:03

Answer 3

+3 A:

First of all, for anyone who hasn't seen this yet, I strongly recommend reading this article on the r-wiki about code optimization.

Here's another version without using ifelse (that's a relatively slow function):

noeq.2 <- function(i, j, N) {
    i <- i-1
    j <- j-1
    x <- i*(N-1) - (i-1)*((i-1) + 1)/2 + j - i
    x2 <- j*(N-1) - (j-1)*((j-1) + 1)/2 + i - j
    idx <- i < j
    x[!idx] <- x2[!idx]
    x[i==j] <- 0
    x
}

And timings on my laptop:

> N <- 1000
> system.time(sapply(1:N, function(i) sapply(1:N, function(j) noeq(i, j, N))))
   user  system elapsed 
  51.31    0.10   52.06 
> system.time(sapply(1:N, function(j) noeq.1(1:N, j, N)))
   user  system elapsed 
   2.47    0.02    2.67 
> system.time(sapply(1:N, function(j) noeq.2(1:N, j, N)))
   user  system elapsed 
   0.88    0.01    1.12

And lapply is faster than sapply:

> system.time(do.call("rbind",lapply(1:N, function(j) noeq.2(1:N, j, N))))
   user  system elapsed 
   0.67    0.00    0.67

Shane 2009-11-07 20:19:21

Hi Shane, only to mention that your link is broken, here is the correct one - http://rwiki.sciviews.org/doku.php?id=tips:programming:code_optim2

Tal Galili 2010-06-13 13:12:16

Answer 4

A:

Oh! It's very usefull and fantastic .

Thank you alot .

Khaled 2010-01-15 16:36:05

This is a comment, not an answer.

Shane 2010-01-15 16:49:21

ansaurus

tags:

views:

answers:

Applying a function to a distance matrix in R.

related questions