tags:

views:

294

answers:

2

Hi stackoverflow-pros,

I need your help again :)

I wrote an R script, that generates a heatmap out of a given tab-seperated txt or xls file. At the moment, I delete all columns I don't want to have in the heatmap by hand in the xls file. Now I want to automatize it, but I don't know how :(

The interesting columns all start the same in all xls files, followed by an individual name:

xls-file 1: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx

xls-file 2: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx L4_tpm_xxxx L5_tpm_xxxx

Any ideas how to select those columns?

Thanking you in anticipation, Philipp

+2  A: 

You could use (if you have read your data in a data.frame df):

df <- df[,grep("^L[[:digit:]]+_tpm.*",colnames(df))]

or you can explicitly write the columns that you want:

df <- df[,c("L1_tpm_xxxx","L2_tpm_xxxx","L3_tpm_xxxx")]

etc...

The following link is quite useful;-)

teucer
Your code won't succeed for a column named "L10_tpm_abcd". I would suggest "^L[0-9]+_tpm"
gd047
First of all thank you for your help!I use read.table for the txt files, but read.xls from the "gdata" package for the excel files. Didn't have time to test it yet, but is this working for read.xls as well?
Philipp
gd047: thanks for the comment, code changed accordingly. Philipp: I guess 'read.xls' is reading data in a data.frame, so it should work as well.
teucer
A: 

Hi! If you think the column positions are going to be fixed across excel sheets, the simplest solution here is to just use column indices. For example, if you use read.table to import a tab-delimited text file as a data.frame, and then decide you'd prefer to only keep the first two columns, you might do something like this:

data <- read.table("path_to_file.txt", header=T, sep="\t")
data <- data[,1:2]
Kyle.
Damn, that would have been to easy ;-)Unfortunately they do not always have the same indices ^^But thanks anyway!
Philipp