tags:

views:

48

answers:

2

I'm taking a random sample from dataframe (a) I've got. The qualifier a[,1] is to be exported into excel. However I'm in trouble.

str(a)

'data.frame':   2299 obs. of  5 variables:
 $ A          : Factor w/ 2284 levels "01012223427",..: 1339 78 779 1590 1687 64 1034 633 1321 109 


a[sample(a[,1],300),]->q

This results in 300 random samples, but several of them are NA. Any ideas?

q[,1]->r

str(r)
 Factor w/ 2284 levels "01012223427",..: 85 1162 1886 549 1996 789 185 321 632 2273

I need to get the r vector in the 01012223427 format into excel, but doing write.csv(r,"r.csv") results in a file with concactenated 1,"01012223427" etc in every cell for the column. I tried write.csv(as.numeric(r),"r.csv") to no help with the factors themselves being output. How can I do this?

--edit

write.csv2(r,"300.csv",row.names=F) solved my problems, but I'm still uncertain with regards to why the NA's are introduced...

//M

+3  A: 

To convert a factor of numbers into numeric, you must first change to character, otherwise you get the internal numbers of the factor, rather than the level labels:

as.numeric(as.character(r))

NA's are possibly introduced because of non-numeric characters in the factor levels.

James
That helped...Appreciate it. //M
Misha
A: 

I'd also check why you have a factor there in the first place. It seems to me that you read it in from some text file, and that there are either spaces included somewhere, or text (a space, a point, a tab, the letters NA,...) which causes R to see the whole column as a character, and to transform it to a factor when using read.csv or the likes.

If you found it, you also know why you get NA's, and you can remediate it before saving the dataframe to a text file. Check the option stringsAsFactors=F in read.table() and read.csv() (or alternatively, as.is=T in read.csv).

Next to that, the piece of code :

a[sample(a[,1],300),]->q

is not doing what you think I guess. I'd use the indices itself, something in the line of :

a[sample.int(dim(a)[1],300),] -> q

If a becomes numeric, your code above won't work any more. It will take the values of a[,1], one of which is 01012223427. So you'd get an error, as there is no row with that index number. Also when transferring a[,1] as a character, the code you use will break.

Joris Meys