tags:

views:

57

answers:

1

I'm trying to work on data from .csv files of known general format but varying group and measure names. I can get a data.frame using:

mydata=read.csv(file.choose(),header=T)    
mydata

    GroupNames  Measure1    Measure2    Measure3  etc
1   group1      value1      value1
2   group1      value2      value2
3   group2      value3      value3
4   group2      value4      value4
5   group2      value5      value5
6   group3      value6      value6
7   group3      value7      value7

etc

Is there a way to subset the data and do the required tests if I don't know the numbers of groups or Measures (or their names) ahead of time?

I can get the first row using:

names(mydata)
[1] "GroupNames" "Measure1" "Measure2" "Measure3" 

I can get the groups using:

Groups<-levels(factor(mydata[[1]]))
Groups
[1] "group1" "group2"  "group3"

I can create a subset using:

g1<-subset(mydata, GroupNames %in% Groups[1])
g1
    GroupNames  Measure1    Measure2    Measure3  etc
1   group1      value1      value1
2   group1      value2      value2

but how do I automatically put "GroupNames" in the above subset command without knowing it ahead of time? Current experiments using:

Titles<-names(mydata)

then

g1<-subset(mydata, Titles[1] %in% Groups[1])

fail, and return:

[1] GroupNames Measure1 Measure2 Measure3     
<0 rows> (or 0-length row.names)

Sorry, but I am a beginner...

A: 

You're not far off, actually. You just need to pass an object instead of a character string when using subset. It should work if you skip the Titles business and just do this:

g1<-subset(mydata, mydata[[1]] %in% Groups[1])
Fojtasek
So close and yet so far... Thanks!
You don't need subset for that: `g1<-mydata[mydata[[1]] %in% Groups[1], ]`
hadley
Thanks. Now I just have to learn how to repeatedly apply that command where the value of 'Groups[1]' automatically goes from 1 to length(Groups)...and the name 'g1' correspondingly changes from g1..g2..gn. For a non-programmer this is very interesting...but slow.
@user441706 Maybe you want `dlply(mydata,.(Groups), function(X) some_code)` (from plyr package). Or `tapply` from base R.
Marek