ansaurus

Question

Answer 1

A:

You have something that works. Are you worried about speed for some reason? Here's an alternative:

y<-c(0,3,2,1,0,0,2,5,0,1,0,0)

decider = function( x ) {
   if ( x == 0 ) {
      return(0)
   }

   return(1)
}

b = sapply( y, decider )

James Thompson 2009-10-07 05:08:17

Out of curiosity: is that any faster than the original version?

Shane 2009-10-07 05:47:37

Answer 2

+5 A:

Try this:

b <- rep(0, length(y))
b[y != 0] <- 1

This is efficient because y and b are the same size and rep() is very fast/vectorized.

Edit:Here's another approach:

b <- ifelse(y == 0, 0, 1)

The ifelse() function is also vectorized.

Shane 2009-10-07 05:11:38

Using ifelse is much less efficient than your first suggestion. ifelse creates some vectors along the way which slows things down when y is very large.

Rob Hyndman 2009-10-07 05:35:01

Thanks Rob. Good to know! Just wanted to show other approaches so that people can add them to their toolkit and stop all this unecessary iteration. Your approach is very efficient.

Shane 2009-10-07 05:41:18

"to show other approaches" - that's exactly why I liked Shane's answer to my previous question - if a person really wants to learn, he would normally be interested in various ways of doing one and the same thing - for the sake of learning and out of curiosity. It seem I can only accept one answer, though.

knot 2009-10-07 05:54:36

I always time various approaches using Systime() (see my answer below). I can't even tell you how surprised I've been by results over the years.

Vince 2009-10-07 05:57:57

Answer 3

+4 A:

b <- as.numeric(y!=0)

Rob Hyndman 2009-10-07 05:12:23

This is about the same speed as Shane's first suggestion, but somewhat neater. Both are much faster than any of the other suggestions given.

Rob Hyndman 2009-10-07 05:40:34

You could also drop the `as.numeric`.

hadley 2009-10-07 13:33:56

Answer 4

+2 A:

Use ifelse(). This is vectorized and (edit: somewhat) fast.

> y <- c(0,3,2,1,0,0,2,5,0,1,0,0)
> b <- ifelse(y == 0, 0, 1)
 [1] 0 1 1 1 0 0 1 1 0 1 0 0

Edit 2: This approach is less fast than the as.numeric(y!=0) approach.

> t <- Sys.time(); b <- as.numeric(y!=0); Sys.time() - t # Rob's approach
Time difference of 0.0002379417 secs
> t <- Sys.time(); b <- ifelse(y==0, 0, 1); Sys.time() - t # Shane's 2nd and my approach
Time difference of 0.000428915 secs
> t <- Sys.time(); b = sapply( y, decider ); Sys.time() - t # James's approach
Time difference of 0.0004429817 sec

But to some, ifelse may be trivially more readable than the as.numeric approach.

Note the OP's version took 0.0004558563 to run.

Vince 2009-10-07 05:37:39

You need to time such things on much longer vectors to get good estimates. Enlarge the vector until the slowest methods takes about 10 secs.

Thierry 2009-10-07 06:53:10

@Thierry: I agree completely, I just am lazy :-) I actually repeated these in terminal multiple times to ensure they were somewhat consistent.

Vince 2009-10-07 07:26:05

For this case, it's a bit of a waste of time - you probably spend a million times more time on the timings than actually running the code. There's no need to profile until you discover that performance is actually a problem.

hadley 2009-10-07 13:35:30

How else do we see which is the most efficient solution (the original question) without timing?

Vince 2009-10-07 15:28:46

Answer 5

+1 A:

b<-(y!=0)+0

b [1] 0 1 1 1 0 0 1 1 0 1 0 0

DWin 2009-10-08 00:30:53

ansaurus

tags:

views:

answers:

R: how do I dichotomise efficiently

related questions