Hello.... So I have a data frame in R that contains integers, NA's, and a random assortment of strings inside the cells. Only one data type per cell. What I'm wondering is how to change all of the cells that contain strings into NA. Any idea how I could do this?
Try is.factor
in a loop. In a data frame R sees a vector of characters as factors.
> a <- 1:5
> b <- c("do", "ray", "mi", "fa", "so")
> df <- data.frame(a, b)
> df
a b
1 1 do
2 2 ray
3 3 mi
4 4 fa
5 5 so
> for(i in ncol(df)) {
+ if(is.factor(df[, i])) df[, i] <- NA
+ }
> df
a b
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
First off, if it is a data.frame
, then types are the same per column. So do something like class(data[,3])
to inquire about the class of third column. You can then use as.numeric()
et al on a given column to transform. Or, per you questions, data[,3] <- NA
in case you know you want to replace that column.
If your data frame (df) is really all integers except for NAs and garbage then then the following converts it.
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
You'll have a warning about NAs introduced by coercion but that's just all those non numeric character strings turning into NAs.
The following code also works and is more concise but runs slower.
df2 <- apply(df, 2, function(x) as.numeric(as.character(x)))
If you just want to convert selected columns then you could use a slightly more complicated command. First you need to figure out which columns you want to convert. Perhaps you save them as a logical vector of the columns you wish to change.
df2 <- cbind(df[,!columnsToChange], apply(df[,columnsToChange], 2, function(x) as.numeric(as.character(x)))
This would knock things out of order but it would get you what you want easy enough.