tags:

views:

198

answers:

2

Hello everyone,

I've got data being read into a data frame R, by column. Some of the columns will increase in value; for those columns only, I want to replace each value (n) with its difference from the previous value in that column. For example, looking at an individual column, I want

c(1,2,5,7,8)

to be replaced by

c(1,3,2,1)

which are the differences between successive elements

However, it's getting really late in the day, and I think my brain has just stopped working. Here's my code at present

col1 <- c(1,2,3,4,NA,2,3,1) # This column rises and falls, so we want to ignore it
col2 <- c(1,2,3,5,NA,5,6,7) # Note: this column always rises in value, so we want to replace it with deltas
col3 <- c(5,4,6,7,NA,9,3,5) # This column rises and falls, so we want to ignore it
d <- cbind(col1, col2, col3)
d
fix_data <- function(data) {
    # Iterate through each column...
    for (column in data[,1:dim(data)[2]]) {
     lastvalue <- 0
     # Now walk through each value in the column, 
     # checking to see if the column consistently rises in value
     for (value in column) {
      if (is.na(value) == FALSE) { # Need to ignore NAs
       if (value >= lastvalue) {
        alwaysIncrementing <- TRUE
       } else {
        alwaysIncrementing <- FALSE
        break
       }
      }
     }

     if (alwaysIncrementing) {
      print(paste("Column", column, "always increments"))
     }

     # If a column is always incrementing, alwaysIncrementing will now be TRUE
     # In this case, I want to replace each element in the column with the delta between successive
     # elements.  The size of the column shrinks by 1 in doing this, so just prepend a copy of
     # the 1st element to the start of the list to ensure the column length remains the same
     if (alwaysIncrementing) {
      print(paste("This is an incrementing column:", colnames(column)))
      column <- c(column[1], diff(column, lag=1))
     }
    }
    data
}

fix_data(d)
d

If you copy/paste this code into RGui, you'll see that it doesn't do anything to the supplied data frame.

Besides losing my mind, what am I doing wrong??

Thanks in advance

+3  A: 

Without addressing the code in any detail, you're assigning values to column, which is a local variable within the loop (i.e. there is no relationship between column and data in that context). You need to assign those values to the appropriate value in data.

Also, data will be local to your function, so you need to assign that back to data after running the function.

Incidentally, you can use diff to see if any value is incrementing rather than looping over every value:

idx <- apply(d, 2, function(x) !any(diff(x[!is.na(x)]) < 0))
d[,idx] <- blah
Shane
+2  A: 

diff calculates the difference between consecutive values in a vector. You can apply it to each column in a dataframe using, e.g.

dfr <- data.frame(x = c(1,2,5,7,8), y = (1:5)^2)
as.data.frame(lapply(dfr, diff))

  x y
1 1 3
2 3 5
3 2 7
4 1 9

EDIT: I just noticed a few more things. You are using a matrix, not a data frame (as you stated in the question). For your matrix 'd', you can use

d_diff <- apply(d, 2, diff)
#Find columns that are (strictly) increasing
incr <- apply(d_diff, 2, function(x) all(x > 0, na.rm=TRUE))
#Replace values in the approriate columns
d[2:nrow(d),incr] <- d_diff[,incr]
Richie Cotton