tags:

views:

72

answers:

2

I have some survey data that I want to describe by political party and state.

I'm having some trouble with the by() aggregation command. It works with lots of functions, but just not length(). Eg:

by(x, list(party=nn$info$party,state=nn$info$st),mean)

works fine but not

by(x, list(party=nn$info$party,state=nn$info$st),length)

Which returns an array filled not with the count of the data I'm looking for, but just a series of 1's. This is what it looks like for Alabama:

party: D
state: AL
[1] 1
--------------------------------------------------------------------------- 
party: I
state: AL
[1] 1
--------------------------------------------------------------------------- 
party: R
state: AL
[1] 1
---------------------------------------------------------------------------

Very mystifying. Any ideas?

+4  A: 

Ok, I'm going to guess that x is a data frame. In which case length returns the number of columns, not the number of elements. You want nrow instead. Note that if foo is a data frame, getting a single column by foo$bar will return a data frame with one column.

> by(1:10, rep(1:5, 2), length)
rep(1:5, 2): 1
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 2
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 3
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 4
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 5
[1] 2
> by(data.frame(1:10), rep(1:5, 2), length)
rep(1:5, 2): 1
[1] 1
------------------------------------------------------------ 
rep(1:5, 2): 2
[1] 1
------------------------------------------------------------ 
rep(1:5, 2): 3
[1] 1
------------------------------------------------------------ 
rep(1:5, 2): 4
[1] 1
------------------------------------------------------------ 
rep(1:5, 2): 5
[1] 1
> by(data.frame(1:10), rep(1:5, 2), nrow)
rep(1:5, 2): 1
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 2
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 3
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 4
[1] 2
------------------------------------------------------------ 
rep(1:5, 2): 5
[1] 2
Jonathan Chang
x is actually a vector of length n, nn$info a dataframe with n rows. What I was trying to do is summarize x by various factors in nn$info, eg, what is the average x of Republicans in Texas?
bshor
+2  A: 
Richie Cotton
x is a vector (of length n), nn$info is a dataframe (with n rows). But this did the trick without any reference to x. Perfect!
bshor