tags:

views:

228

answers:

3

I have a data.frame, d1, that has 7 columns, the 5th through 7th column are supposed to be numeric:

str(d1[5])
'data.frame':   871 obs. of  1 variable:
 $ Latest.Assets..Mns.: num  14008 1483 11524 1081 2742 ... 

is.numeric(d1[5])
[1] FALSE

as.numeric(d1[5])
Error: (list) object cannot be coerced to type 'double'

How can this be? If str identifies it as numeric, how can it not be numeric? I'm importing from CSV.

+2  A: 

It may be a list (based on the error message). Have you tried class(d1[5])? If it's a list, then you would expect either d1[[5]] or d1[5][[1]] to be numeric.

Edit:

Given that d1[5] is itself a data frame, you need to treat it as such. Something like this should work:

is.numeric(d1[5][,1])
Shane
class(d1[5]) returns, [1] "data.frame".What does the extra set of square brackets do for d1[[5]]?
Brandon Bertelsen
The extra set of brackets there would help if d1 was actually a list. But basically you need to extract the value from your data.frame's data.frame...
Shane
1beb, for a better understanding of the difference between [] and [[]] look at the indexing section of the R Lang Def: http://cran.r-project.org/doc/manuals/R-lang.html#Indexing
JD Long
d1 is a list. see my answer for extracting value.
Alex Brown
+2  A: 
> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))

> is.numeric_data.frame(d1[[5]])
[1] TRUE 

Why

d1 is a list, hence d1[5] is a list of length 1, and in this case contains a data.frame. to get the data frame, use d1[[5]].

Even if a data frame contains numeric data, it isn't numeric itself:

> x = data.frame(1:5,6:10)
> is.numeric(x)
[1] FALSE

Individual columns in a data frame are either numeric or not numeric. For instance:

> z <- data.frame(1:5,letters[1:5])

> is.numeric(z[[1]])
[1] TRUE
> is.numeric(z[[2]])
[1] FALSE

If you want to know if ALL columns in a data frame are numeric, you can use all and sapply:

> sapply(z,is.numeric)
    X1.5 letters.1.5. 
    TRUE        FALSE 

> all(sapply(z,is.numeric))
[1] FALSE

> all(sapply(x,is.numeric))
[1] TRUE

You can wrap this all up in a convenient function:

> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))

> is.numeric_data.frame(d1[[5]])
[1] TRUE 
Alex Brown
+2  A: 

d1[5] is not a single value. It's a vector (possibly a list?) of values. If you grab a single value I bet it is numeric. For example:

is.numeric(d1[5][[1]])
as.numeric(d1[5][[1]])

So I think the confusion is between the column object and the elements in the column. R makes a distinction between those two ideas while other languages, like SQL, functionally assume that when discussing the column you're usually referring to the elements of the column.

This discussion of indexing from the R Language Definition doc really helped me wrap my head around how to reference items in R.

JD Long
the list member is a data frame, which is never numeric.
Alex Brown
Alex: what are you talking about? class(list(x=1)[[1]]) == "numeric"
Shane