tags:

views:

282

answers:

3

I have a data.frame called series_to_plot.df which I created by combining a number of other data.frames together (shown below). I now want to pull out just the .mm column from each of these, so I can plot them. So I want to pull out the 3rd column of each data.frame (e.g. p3c3.mm, p3c4.mm etc...), but I can't see how to do this for all data.frames in the object without looping through the name. Is this possible?

I can pull out just one set: e.g. series_to_plot.df[[3]] and another by series_to_plot.df[[10]] (so it is just a list of vectors..) and I can reference directly with series_to_plot.df$p3c3.mm, but is there a command to get a vector containing all mm's from each data.frame? I was expecting an index something like this to work: series_to_plot.df[,3[3]] but it returns Error in [.data.frame(series_to_plot.df, , 3[3]) : undefined columns selected

series_to_plot.df
          p3c3.rd         p3c3.day    p3c3.mm      p3c3.sd                 p3c3.n p3c3.noo p3c3.no_NAs
    1     2010-01-04             0    0.1702531    0.04003364              7                1           0
    2     2010-01-06             2    0.1790594    0.04696674              7                1           0
    3     2010-01-09             5    0.1720404    0.03801756              8                0           0

          p3c4.rd         p3c4.day    p3c4.mm      p3c4.sd                 p3c4.n p3c4.noo p3c4.no_NAs
    1     2010-01-04             0    0.1076581   0.006542157              6                2           0
    2     2010-01-06             2    0.1393447   0.066758781              7                1           0
    3     2010-01-09             5    0.2056846   0.047722862              7                1           0

          p3c5.rd         p3c5.day    p3c5.mm      p3c5.sd                 p3c5.n p3c5.noo p3c5.no_NAs
    1     2010-01-04             0   0.07987147   0.006508766              7                1           0
    2     2010-01-06             2   0.11496167   0.046478767              8                0           0
    3     2010-01-09             5   0.40326471   0.210217097              7                1           0
+3  A: 

To get all columns with specified name you could do:

names_with_mm <- grep("mm$", names(series_to_plot.df), value=TRUE)
series_to_plot.df[, names_with_mm]

But if your base data.frame's all have the same structure then you can rbind them, something like:

series_to_plot.df <- rbind(
  cbind(name="p3c3", p3c3),
  cbind(name="p3c4", p3c4),
  cbind(name="p3c5", p3c5)
)

Then mm values are in one column and its easier to plot.

Marek
+1  A: 

The R Language Definition has some good info on indexing (sec 3.4.1), which is pretty helpful.

You can then pull the names matching a sequence with the grep() command. Then string it all together like this:

 dataWithMM <- series_to_plot.df[,grep("[P]", names(series_to_plot.df))]

to deconstruct it a little, this gets the number of the columns that match the "mm" pattern:

 namesThatMatch <- grep("[mm]", names(series_to_plot.df)

Then we use that list to call the columns we want:

  dataWithMM <- series_to_plot.df[, namesThatMatch ]
JD Long
Marek's answer has a better regex than mine. "[mm]" will match any column with "mm" in it anywhere. "mm$" will match only the columns that end in "mm" which may be a better fit.
JD Long
+1  A: 

To add to the other answers, I don't think it is a good idea to have useful information encoded in variable names. Much better to rearrange your data so that all useful information is in the value of some variable. I don't know enough about your data set to suggest the right format, but it might be something like

p c         rd day date mm sd ...
3 3 2010-10-04 ...

Once you have done this the answer to your question becomes the simple df$mm.

If you are getting the data in a less useful form from an external source, you can rearrange it in a more useful form like the above within R using the reshape function or functions from the reshape package.

Jyotirmoy Bhattacharya
I tried suggest it in my answer too, but I assumed that `pxcy` is a name of a partial data.frame (that rbind/cbind stuff). But your hint to include parts of names as new columns is very nice.
Marek
Thanks, I'll try melt and reshape... (also see http://stackoverflow.com/questions/1181060/reshaping-time-series-data-from-wide-to-tall-format-for-plotting)
John