ansaurus

Question

How to handle with empty dataframes in R?

Answer 1

+1 A:

I don't think it's related to 0-row data.frame:

X <- data.frame(a=numeric(0))
str(X)
# 'data.frame':   0 obs. of  1 variable:
# $ a: num 
apply(X,1,sum)
# integer(0)

Try use traceback() after error to see what exactly cause it.

Marek 2010-09-07 08:30:49

Answer 2

+1 A:

I would use mapply instead:

kk <- data.frame( start = integer(0), end = integer(0) )
kkk <- data.frame( start = 1, end = 3 )

vect <- rnorm( 100 ) > 0

with(kk,  mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )
with(kkk, mapply( function(x, y) !any( vect[x]:vect[y] ), start, end ) )

datanalytics.com 2010-09-07 08:40:05

Answer 3

+2 A:

This has absolutely nothing to do with apply. The function you are applying does not work when the data.frame is empty.

> myFUN <- function(row) !any(vec[ row[["start"]]:row[["end"]] ])
> myFUN(DF[1,])  # non-empty data.frame
[1] FALSE
> myFUN(data.frame()[1,])  # empty data.frame
Error in row[["start"]]:row[["end"]] : argument of length 0

Add a condition to your function.

> apply(X=data.frame(),MARGIN=1,  # empty data.frame
+  FUN=function(row) {
+    if(length(row)==0) return()
+    !any(vec[ row[["start"]]:row[["end"]] ])
+  })
NULL

Joshua Ulrich 2010-09-07 13:53:10

I'm not sure I understand how `apply(MARGIN=1)` works. I assumed it send each row to `FUN` and aggregates the results. If that was the case, an empty data frame shouldn't have failed since `FUN` would never have been called. So I guess this isn't the case. I looked at the documentation but still didn't figure out how it works exactly.

David B 2010-09-07 14:28:04

`apply` does not aggregate. It puts the results of the calls to `FUN` on portions ("margins") of `X` into an object. The resulting object is defined in the first paragraph in the "Value" section of `?apply`. I'm not sure why you assumed `FUN` wouldn't be called if `X` is empty; the documentation doesn't even hint at that behavior.

Joshua Ulrich 2010-09-07 14:53:56

Answer 4

+1 A:

On a side note: apply always accesses the function you use at least once. If the input is a dataframe without any rows but with defined variables, it sends "FALSE" as an argument to the function. If the dataframe is completely empty, it sends a logical(0) to the function.

> x <- data.frame(a=numeric(0))
> str(x)
'data.frame':   0 obs. of  1 variable:
 $ a: num 

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
[1] FALSE

> x <- data.frame()

> str(x)
'data.frame':   0 obs. of  0 variables

> y <- apply(x,MARGIN=1,FUN=function(x){print(x)})
logical(0)

So as Joshua already told you, either control before the apply whether the dataframe has rows, or add a condition in the function within the apply.

EDIT : This means you should take into account that length(x)==0 is not a very good check, you need to check whether either length(x==0) or !x is TRUE if both possibilities could arise : (Code taken from Joshua)

apply(X=data.frame(),MARGIN=1,  # empty data.frame
  FUN=function(row) {
    if(length(row)==0 || !row) {return()}
    !any(vec[ row[["start"]]:row[["end"]] ])
  })

Joris Meys 2010-09-07 14:58:50

I think it might be better to use `if(length(row)==0 || !row)` (`||` instead of `|`), otherwise we might get warnings saying `the condition has length > 1 and only the first element will be used`

David B 2010-09-08 06:10:29

Very true! Thx for the correction

Joris Meys 2010-09-08 07:15:13

p.s. where is this behavior of `apply` that you have mentioned documented?

David B 2010-09-08 08:42:31

@David : in the code above. Sometimes testing it out yourself gives you already a lot of insight. I remembered I tried it out a while ago.

Joris Meys 2010-09-08 09:22:00

ansaurus

tags:

views:

answers:

How to handle with empty dataframes in R?

related questions