tags:

views:

89

answers:

2

I have a matrix filled with TRUE/FALSE values and I am trying to find the index position of the first TRUE value on each row (or return NA if there is no TRUE value in the row). The following code gets the job done, but it uses an apply() call, which I believe is just a wrapper around a for loop. I'm working with some large datasets and performance is suffering. Is there a faster way?

> x <- matrix(rep(c(F,T,T),10), nrow=10)
> x
       [,1]  [,2]  [,3]
 [1,] FALSE  TRUE  TRUE
 [2,]  TRUE  TRUE FALSE
 [3,]  TRUE FALSE  TRUE
 [4,] FALSE  TRUE  TRUE
 [5,]  TRUE  TRUE FALSE
 [6,]  TRUE FALSE  TRUE
 [7,] FALSE  TRUE  TRUE
 [8,]  TRUE  TRUE FALSE
 [9,]  TRUE FALSE  TRUE
[10,] FALSE  TRUE  TRUE

> apply(x,1,function(y) which(y)[1])
 [1] 2 1 1 2 1 1 2 1 1 2
+1  A: 

Not sure this is any better, but this is one solution:

> x2 <- t(t(matrix(as.numeric(x), nrow=10)) * 1:3)
> x2[x2 == 0] <- Inf
> rowMins(x2)
 [1] 2 1 1 2 1 1 2 1 1 2

Edit: Here's a better solution using base R:

> x2 <- (x2 <- which(x, arr=TRUE))[order(x2[,1]),]
> x2[as.logical(c(1,diff(x2[,1]) != 0)),2]
 [1] 2 1 1 2 1 1 2 1 1 2
Shane
Thanks Shane, that gets the job done.
Abiel
need to load fBasics or fUtilities first I think...
John
Good catch...`require(fBasics)`.
Shane
A: 

You can gain a lot of speed by using %% and %/%

x <- matrix(rep(c(F,T,T),10), nrow=10)

z <- which(t(x))-1
((z%%ncol(x))+1)[match(1:nrow(x), (z%/%ncol(x))+1)]

This can be adapted as needed : if you want to do this for columns, you don't have to transpose the matrix.

Tried out on a 1,000,000 X 5 matrix :

> x <- matrix(sample(c(F,T),5000000,replace=T), ncol=5)

> system.time(apply(x,1,function(y) which(y)[1]))
   user  system elapsed 
  12.61    0.07   12.70 

> system.time({
+ z <- which(t(x))-1
+ (z%%ncol(x)+1)[match(1:nrow(x), (z%/%ncol(x))+1)]}
+ )
   user  system elapsed 
   1.11    0.00    1.11 

You could gain quite a lot this way.

Joris Meys
If there is no TRUE available in a row, this solution gives NA for that.
Joris Meys