views:

121

answers:

2

Hi all,

I wish to implement a "Last Observation Carried Forward" for a data set I am working on which has missing values at the end of it.

Here is a simple code to do it (question after it):

LOCF <- function(x)
{
    # Last Observation Carried Forward (for a left to right series)
    LOCF <- max(which(!is.na(x))) # the location of the Last Observation to Carry Forward
    x[LOCF:length(x)] <- x[LOCF]
    return(x)
}


# example:
LOCF(c(1,2,3,4,NA,NA))
LOCF(c(1,NA,3,4,NA,NA))

Now this works great for simple vectors. But if I where to try and use it on a data frame:

a <- data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))
a
t(apply(a, 1, LOCF)) # will make a mess

It will turn my data frame into a character matrix.

Can you think of a way to do LOCF on a data.frame, without turning it into a matrix? (I could use loops and such to correct the mess, but would love for a more elegant solution)

Cheers,

Tal

+4  A: 

This already exists:

library(zoo)
na.locf(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA)))
Shane
+1 and rseek.org of course immediately hits this as first results.
Dirk Eddelbuettel
My bid for not rseeking it - thanks Shane. But I am afraid it doesn't do the job. (it fills column 3, instead of each row)
Tal Galili
You could have also found this if you searched stackoverflow.com for `[r] locf`.
Shane
Hi Shane, I also wasn't able to find solution in that search (Although this thread is nice: http://stackoverflow.com/questions/1782704/propagating-data-within-a-vector/1783275#1783275 )
Tal Galili
Look at the accepted answer to that thread. That's what I was referring to. I don't think this question is a duplicate because the other questioner was asking about vectors and you're asking about data frames, but they're very closely related (and the answer is the same).
Shane
Hi Shane, the function can be used like this: t(na.locf(t(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))))) But it will not "solve" the question, since I would need to go through the resulting "matrix" and turn it back to a data.frame. And thanks for taking the time to help :) Tal
Tal Galili
Oh...you want to carry column values "forward"? That isn't usually what people do. An "observation" is a row value in R, so LOCF means carry row values downward. You're carrying values across columns. I can't even imagine a circumstance in which one would do that?
Shane
Hi Shane, it's very simple. I have a wide (instead of long) data.frame. I can turn it to long and then use a function from the other SO thread. The only problem with that would be the case of a the first value being missing...
Tal Galili
If the first value is missing, then you can make a judgement about what to do to handle it. No function will solve that problem for you. You will need to either leave the whole thing as missing, or set a default first value (like zero, for instance).
Shane
Point taken, thanks Shane.
Tal Galili
A: 

I ended up solving this using a loop:

fillInTheBlanks <- function(S) {
  L <- !is.na(S)
  c(S[L][1], S[L])[cumsum(L)+1]
}


LOCF.DF <- function(xx)
{
    # won't work well if the first observation is NA

    orig.class <- lapply(xx, class)

    new.xx <- data.frame(t( apply(xx,1, fillInTheBlanks) ))

    for(i in seq_along(orig.class))
    {
        if(orig.class[[i]] == "factor") new.xx[,i] <- as.factor(new.xx[,i])
        if(orig.class[[i]] == "numeric") new.xx[,i] <- as.numeric(new.xx[,i])
        if(orig.class[[i]] == "integer") new.xx[,i] <- as.integer(new.xx[,i])   
    }

    #t(na.locf(t(a)))

    return(new.xx)
}

a <- data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))
LOCF.DF(a)
Tal Galili