tags:

views:

367

answers:

2

I have a function that at the moment programmed in a functional model and either want to speed it up and maybe solve the problem more in the spirit of R. I have a data.frame and want to add a column based on information that's where every entry depends on two rows. At the moment it looks like the following:

faultFinging <- function(heartData){
    if(heartData$Pulse[[1]] == 0){
        Group <- 0
    }
    else{
        Group <- 1
    }
    for(i in seq(2, length(heartData$Pulse), 1)){
        if(heartData$Pulse[[i-1]] != 0 
            && heartData$Pulse[[i]] != 0
            && abs(heartData$Pulse[[i-1]] - heartData$Pulse[[i]])<20){
            Group[[i]] <- 1
        }
        else{
            if(heartData$Pulse[[i-1]] == 0 && heartData$Pulse[[i]] != 0){
                Group[[i]] <- 1
            }
            else{
                Group[[i]] <- 0
            }
        }
    }
    Pulse<-heartData$Pulse
    Time<-heartData$Time
    return(data.frame(Time,Pulse,Group))
}
+2  A: 

I can't test this without sample data, but this is the general idea. You can avoid doing the for() loop entirely by using & and | which are vectorized versions of && and ||. Also, there's no need for an if-else statement if there's only one value (true or false).

faultFinging <- function(heartData){
    Group <- as.numeric(c(heartData$Pulse[1] != 0,
      (heartData$Pulse[-nrow(heartData)] != 0 
        & heartData$Pulse[-1] != 0
        & abs(heartData$Pulse[-nrow(heartData)] - heartData$Pulse[-1])<20) |
      (heartData$Pulse[-nrow(heartData)] == 0 & heartData$Pulse[-1] != 0)))
    return(cbind(heartData, Group))
}

Putting as.numeric() around the index will set TRUE to 1 and FALSE to 0.

Shane
Since `idx` is logical, `group <- as.numeric(idx)` is sufficient.
hadley
+1  A: 

This can be done in a more vector way by separating your program into two parts: firstly a function which takes two time samples and determines if they meet your pulse specification:

isPulse <- function(previous, current)
{ 
  (previous != 0 & current !=0 & (abs(previous-current) < 20)) |
  (previous == 0 & current !=0)
}

Note the use of vector | instead of boolean ||.

And then invoke it, supplying the two vector streams 'previous' and 'current' offset by a suitable delay, in your case, 1:

delay <- 1
samples = length(heartData$pulse)

isPulse(heartData$pulse[-(samples-(1:delay))], heartData$pulse[-(1:delay)])

Let's try this on some made-up data:

sampleData = c(1,0,1,1,4,25,2,0,25,0)
heartData = data.frame(pulse=sampleData)
result = isPulse(heartData$pulse[-(samples-(1:delay))], heartData$pulse[-(1:delay)])

Note that the code heartData$pulse[-(samples-(1:delay))] trims delay samples from the end, for the previous stream, and heartData$pulse[-(1:delay)] trims delay samples from the start, for the current stream.

Doing it manually, the results should be (using F for false and T for true)

F,T,T,T,F,F,F,T,F

and by running it, we find that they are!:

> print(result)
FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE

success!

Since you want to bind these back as a column into your original dataset, you should note that the new array is delay elements shorter than your original data, so you need to pad it at the start with delay FALSE elements. You may also want to convert it into 0,1 as per your data:

resultPadded <- c(rep(FALSE,delay), result)
heartData$result = ifelse(resultPadded, 1, 0)

which gives

> heartData
   pulse result
1      1      0
2      0      0
3      1      1
4      1      1
5      4      1
6     25      0
7      2      0
8      0      0
9     25      1
10     0      0
Alex Brown