tags:

views:

50

answers:

3

I have a CSV with a bunch of data like so:

> test.csv <- read.csv("~/Desktop/stats.csv")
> test.csv
          m                   lvl a     b           c a_pct  b_pct  c_pct   d_pct
1    543557                    2A 13  255  59.6666667  18.8   10.2    1.6     5.1
2    545059                    2A  0   19   4.0000000  15.8   15.8    5.3    10.5

I want to be able to do a histogram of like a_pct using hist(test.csv$a_pct) but only on qualifying rows, where like c_pct > 20 or c < 200, etc. Sort of like a SQL WHERE clause. Is there a way to do this easily in R?

+2  A: 

Try this:

hist(test.csv[test.csv$c_pct > 20 | test.csv$c < 200, "a_pct"]) 

Two notes:

  1. A data.frame is indexed by [rows, columns], where you can specify anything to select out the specific rows/columns.
  2. You need to use | instead of ||, since the former is vectorized.
Shane
I'm running into trouble using a string. Something like: `[test.csv$lvl = '2A', "a_pct"]` but that fails. Any ideas?
Wells
It's not the string. `=` is used for assignment, `==` is for logical comparison. Read through `help("<")` for more information.
Joshua Ulrich
Is this faster than subset()?
Brandon Bertelsen
+2  A: 

A simple way is just:

with( test.csv, hist( a_pct[ c_pct > 20 ] ) )
Greg Snow
You need the `c < 200`, otherwise you get an error.
Joshua Ulrich
+1  A: 

Have you looked at ?subset

hist(subset(test.csv, c_pct > 20 | c < 200, select=a_pct))
Brandon Bertelsen