tags:

views:

302

answers:

2

Hi all, I'm reading a text file like this in R 2.10.0

248585_at   250887_at 245638_s_at AFFX-BioC-5_at
248585_at   250887_at 264488_s_at 245638_s_at AFFX-BioC-5_at AFFX-BioC-3_at AFFX-BioDn-5_at
248585_at   250887_at

Using the command clusters<-read.delim("test",sep="\t",fill=TRUE,header=FALSE)

Now, I must pass every row in this file to a BioConductor function that takes only character vectors as input. MY problem is that using "as.character" on this "clusters" object turns everything into numeric strings.

> clusters[1,]
         V1        V2          V3             V4 V5 V6 V7
1 248585_at 250887_at 245638_s_at AFFX-BioC-5_at

But

> as.character(clusters[1,])
[1] "1" "1" "2" "3" "1" "1" "1"

Is there any way to keep the original names and put them into a character vector?

Maybe it helps: my "clusters" object given by the "read.delim" file belongs to the "list" type.

Thanks a lot :-)

Federico

+1  A: 

I never would have expected that to happen, but trying a small test case produces the same results you're giving.

Since the result of df[1,] is itself a data.frame, one fix I thought to try was to use unlist -- seems to work:

> df <- data.frame(a=LETTERS[1:10], b=LETTERS[11:20], c=LETTERS[5:14])
> df[1,]
  a b c
1 A K E
> as.character(df[1,])
[1] "1" "1" "1"
> as.character(unlist(df[2,]))
[1] "B" "L" "F"

I think turning the data.frame into a matrix first would also get around this:

m <- as.matrix(df)
> as.character(m[2,])
[1] "B" "L" "F"

To avoid issues with factors in your data.frame you might want to set stringsAsFactors=TRUE when reading in your data from the text file, eg:

clusters <- read.delim("test", sep="\t", fill=TRUE, header=FALSE,
                       stringsAsFactors=FALSE)

And, after all that, the unexpected behavior seems to come from the fact that the original affy probes in your data.frame are treated as factors. So, doing the stringsAsFactors=FALSE thing will side-step the fanfare:

df <- data.frame(a=LETTERS[1:10], b=LETTERS[11:20],
                 c=LETTERS[5:14], stringsAsFactors=FALSE)
> as.character(df[1,])
[1] "A" "K" "E"
Steve Lianoglou
+3  A: 

By default character columns are converted to factors. You can avoid this by setting as.is=TRUE argument:

clusters <- read.delim("test", sep="\t", fill=TRUE, header=FALSE, as.is=TRUE)

If you only pass arguments from text file to character vector you could do something like:

x <- readLines("test")
xx <- strsplit(x,split="\t")
xx[[1]] # xx is a list
# [1] "248585_at"      "250887_at"      "245638_s_at"    "AFFX-BioC-5_at"
Marek