tags:

views:

88

answers:

1

This is a newbie question in R. I am importing a csv file into R using sqldf package. I have several missing values for both numeric and string variables. I notice that missing values are left empty in the dataframe (as opposed to being filled with NA or something else). I want to replace the missing values with an user defined value. Obviously, a function like is.na() will not work in this case. Thank you in advance. Toy dataframe with three columns.
A B C
3 4
2 4 6
34 23 43
2 5

I want

A B C
3 4 NA
2 4 6
34 23 43
2 5 NA

+1  A: 

Assuming you are using read.csv.sql in sqldf with the default sqlite database it is producing a factor column for C so

(1) just convert the values to numeric using as.numeric(as.character(...)) like this:

> Lines <- "A,B,C
+ 3,4,
+ 2,4,6
+ 34,23,43
+ 2,5,
+ "
> cat(Lines, file = "stest.csv")
> library(sqldf)
> DF <- read.csv.sql("stest.csv")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: Factor w/ 3 levels "","43","6": 1 3 2 1
> DF$C <- as.numeric(as.character(DF$C))
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(2) or if we use sqldf(..., method = "raw") then we can just use as.numeric:

> DF <- read.csv.sql("stest.csv", method = "raw")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: chr  "" "6" "43" ""
> DF$C <- as.numeric(DF$C)
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(3) If its feasible for you to use read.csv then we do get NA filling right off:

> str(read.csv("stest.csv"))
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: int  NA 6 43 NA
G. Grothendieck