tags:

views:

65

answers:

2

I am newbie and have been searching for the past hour on how to do a simple function in R.

I have a very large dataframe with rows as observations and columns as genetic markers. I would like to create a new column that contains the sum of a select number of columns for each observation using R.

If I have 200 columns and 100 rows, I would like a to create a new column that has 100 rows with the sum of say columns 43 through 167. The columns have either 1 or 0. With the new column that contains the sum of each row, I will be able to sort the individuals who have the most genetic markers.

I feel it is something close to: data$new=sum(data$[,43:167])

thanks in advance

+4  A: 

you can use rowSums

rowSums(data) should give you what you want.

Greg
And for OP problem `data$new <- rowSums(data[43:167])`
Marek
oops sorry about that.
Greg
+2  A: 

The rowSums function (as Greg mentions) will do what you want, but you are mixing subsetting techniques in your answer, do not use "$" when using "[]", your code should look something more like:

data$new <- rowSums( data[,43:167] )

If you want to use a function other than sum, then look at ?apply for applying general functions accross rows or columns.

Greg Snow