views:

75

answers:

2

I have a list of counties in each state that received nonattainment status in years 1995-2005.

I want to know how many counties in each state each year that received this status.

If my data is formatted like this,

State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1  Yr1 Yr2.....

Each year variable could have a 1 or a zero, since a county may gain or lose this status in a time period.

I need to count each year how many counties in each state have nonattainment status (YRx=1), but can't think of how to do it.

A: 

Is this data organized as a dataframe? If so, how are the rows defined? If your data were organized this way:

State   County  Year    Attainment  
State1   County1  1       1  
State1   County1  2       0
State1   County1  3       1
State1   County1  4       1
State1   County2  1       1
State1   County2  2       1
...

Then it would be possible to get the kind of summary data you're looking for with 1 line of code. Hopefully your notation means that your data is organized like this:

State   County  Yr1 Yr2 Yr3 Yr4
State1   County1 1  0   1   1
State1   County2 1  1   1   1

Use melt() from the reshape package to get from this format to the one laid out above.

new.df <- melt(df, id = 1:2)

It'll call the Year variable variable and the Attainment variable value. Now, with clever use of the cast function, also from the reshape package, you can get the summary you want.

counties <- cast(new.df, State ~ value, fun = length)
head(counties)

However, if your data is organized so that every state, county and year is a column, and it's only 1 row long, I think your best next step would be to reformat the data outside of R so that it looks at least like my second example.

JoFrhwld
it is organized the second way. i thought i formatted my question correctly, but when i posted it, it merged into one line.
Alison
JoFrhwld- I did that, but it game me the summaries for all years, not total for each year. I'm still working on this problem, if you have any suggestions I would appreciate it. Your help has gotten me further at least!!
Alison
+2  A: 

I used the following example:

data <- read.table(textConnection("
state county Yr1 Yr2 Yr3 Yr4
state1 county1 1 0 0 1
state1 county2 0 0 0 0
state1 county3 0 1 0 0
state1 county4 0 0 0 0
state1 county5 0 1 0 1
state2 county6 0 0 0 0
state2 county7 0 0 1 0
state2 county8 1 0 0 1
state2 county9 0 0 0 0
state2 county10 0 1 0 0
state3 county11 1 1 1 1
state3 county12 0 0 0 0
state3 county13 0 1 1 0
state3 county14 0 0 0 1
state4 county15 0 0 0 0
state4 county16 1 0 1 0
state4 county17 0 0 0 0
state4 county18 1 1 1 1
"), header = T)

library(reshape)
data2 <- melt(data, id = c("state", "county"))
cast(data2, state ~ variable, fun = sum)

Result:

   state Yr1 Yr2 Yr3 Yr4
1 state1   1   2   0   2
2 state2   1   1   1   1
3 state3   1   2   2   2
4 state4   2   1   2   1
gd047
wow, thanks a lot. i have been struggling on this all afternoon. i'll try it out.
Alison