ansaurus

Question

Using reshape + cast to aggregate over multiple columns

Answer 1

+6 A:

Hadley's plyr package may help you:

ddply(df, .(seat), function(x) data.frame(winner=x[which.max(x$votes),]$party, voteshare=max(x$votes)/sum(x$votes)))

kohske 2010-05-06 14:52:45

Thanks. That's exactly what I wanted.

DamonJW 2010-05-06 17:18:28

Or, more succinctly (and soon to be faster): `ddply(df, .(seat), summarise, winner = party[which.max(votes)], voteshare = max(votes) / sum(votes)))`

hadley 2010-05-08 03:50:31

Answer 2

+2 A:

You may be right that there's a cunning one-liner. I tend to favour the approach that understandable is better than clever, especially when you're first looking at something. Here's the more verbose alternative.

votes_by_seat_and_party <- as.matrix(cast(df, seat ~ party, value="votes"))

   C Lab LD
A 12   1  2
B  3  11 10
C  9   4  5
D  6   8 15

seats <- rownames(votes_by_seat_and_party)
parties <- colnames(votes_by_seat_and_party)

winner_col <- apply(votes_by_seat_and_party, 1, which.max)
winners <- parties[winner_col]
voteshare_of_winner_by_seat <- apply(votes_by_seat_and_party, 1, function(x) max(x) / sum(x))

results <- data.frame(seat = seats, winner = winners, voteshare = voteshare_of_winner_by_seat)

  seat winner voteshare
1    A      C 0.8000000
2    B    Lab 0.4583333
3    C      C 0.5000000
4    D     LD 0.5172414

# Full voteshare matrix, if you're interested
total_votes_by_seat <- rowSums(votes_by_seat_and_party)
voteshare_by_seat_and_party <- votes_by_seat_and_party / total_votes_by_seat

Richie Cotton 2010-05-06 15:36:42

You can treat missing values (where there was no candidate for a particular party at a given seat) as `0` or `NA`.

Richie Cotton 2010-05-06 16:00:04

Answer 3

+2 A:

OK, so 3 solutions... here's another more compact solution using raw R. It is 4 sparse code lines. I'm assuming missing values are 0, or just missing, because it won't matter. My guess is that this would be your fastest code for a large set of data.

#get a sum for dividing
s <- aggregate(df$votes, list(seat = df$seat), sum)
#extract the winner and seat
temp <- aggregate(df$votes, list(seat = df$seat), max)
res <- df[df$seat %in% temp$seat & df$votes %in% temp$x,]
res$votes <- res$votes / s$x

Rename the columns if you wish...

res$names <- c('party', 'voteshare', 'winner')

(this will return an error in the event of a tie... you'll be able to see it in the temp data frame)

John 2010-05-06 17:16:32

ansaurus

tags:

views:

answers:

Using reshape + cast to aggregate over multiple columns

related questions