tags:

views:

56

answers:

3

I need to read a data frame from a file containing NULL values. Here's an example file:

charCol floatCol intCol
a       1.5      10
b       NULL     3
c       3.9      NULL
d       -3.4     4

I read this file into a data frame:

> df <- read.table('example.dat', header=TRUE)

But "NULL" entries are not interpreted by R as NULL:

> is.null(df$floatCol[2])
[1] FALSE

How should I format my input file so that R properly treats such entries as NULL?

A: 

I have never done anything in r, but I would assume that your variable has the value "NULL", so try checking if the variable is equal to the string "NULL" instead. If you have to use the is.null() method you could go through your variables and convert "NULL" to NULL.

Erik B
In R, there is an important different between `NA` (length 1) and `NULL` (length 0).
hadley
+3  A: 

Try this:

> Lines <- "charCol floatCol intCol
+ a       1.5      10
+ b       NULL     3
+ c       3.9      NULL
+ d       -3.4     4"
> 
> # DF <- read.table("myfile", header = TRUE, na.strings = "NULL")
> DF <- read.table(textConnection(Lines), header = TRUE, na.strings = "NULL")
> DF
  charCol floatCol intCol
1       a      1.5     10
2       b       NA      3
3       c      3.9     NA
4       d     -3.4      4
G. Grothendieck
+3  A: 

Always always always do summary(thing) if something is unexpected.

> summary(df)
 charCol floatCol  intCol 
 a:1     1.5 :1   10  :1  
 b:1     -3.4:1   3   :1  
 c:1     3.9 :1   4   :1  
 d:1     NULL:1   NULL:1  

that looks a bit weird. Drill down:

> summary(df$floatCol)
 1.5 -3.4  3.9 NULL 
   1    1    1    1 

what the heck is it?

> class(df$floatCol)
[1] "factor"

The presence of an invalid numeric format (the string 'NULL') has caused R to go "oh I guess these aren't numbers, I'll read them into character strings and make a factor (categorical variable) for you".

The solution has just been posted to use na.string="NULL", but remember that NA isn't the same as NULL in R. NA is a marker for missing data, NULL is a genuine non-value. Compare:

> c(1,2,3,NULL,4)
[1] 1 2 3 4
> c(1,2,3,NA,4)
[1]  1  2  3 NA  4

Once you've read it in correctly, the appropriate test is usually is.na(foo)

Spacedman
+1 For pointing that this cannot happen.
mbq