ansaurus

Question

Answer 1

+1 A:

Your example data frame has 27 rows and you're looking at every set (except the null set) so that's 2^27 - 1 = 134 217 727 operations at least... this is going to take some time. That said, here's what I believe to be a more efficient version of your code. It seems to match the Wikipedia article at least: http://en.wikipedia.org/wiki/Banzhaf_power_index

banzhaf1 <- function(data, quota) {
  n <- nrow(data)
  vote <- data$vote
  swingsPerIndex <- numeric(n)
  for (setSize in 1:n) {
    sets <- utils::combn(n, setSize)
    numSets <- ncol(sets)
    flatSets <- as.vector(sets)
    voteMatrix <- matrix(vote[flatSets], nrow=setSize, ncol=numSets)
    totals <- colSums(voteMatrix)
    aboveQuota <- totals >= quota
    totalsMatrix <- matrix(rep(totals, each=setSize), nrow=setSize, ncol=numSets)
    winDiffs <- totalsMatrix[, aboveQuota] - voteMatrix[, aboveQuota]
    winSets <- sets[, aboveQuota]
    swingers <- as.vector(winSets[winDiffs < quota])
    swingsPerIndex <- swingsPerIndex + tabulate(swingers, n)
  }
  return(data.frame(name=data$member, score=swingsPerIndex / sum(swingsPerIndex)))
}

(I haven't tried running this on the full data set.)

I think to really approach this problem efficiently, you'll have to take advantage of the structure of the problem. For instance, once you know set X has vote sum above quota, then you know that X union Y is also above quota. I'm not sure if R will be well-suited to following such structure.

David F 2010-09-17 18:30:27

Answer 2

+1 A:

My approach was similar to David's, using batched matrix operations to handle the size:

banzhaf = function(votes, pass=sum(votes) %/% 2 + 1, batch.size=500000, quiet=batches == 1) {
  n = length(votes)
  batches = ceiling((2^n / batch.size))
  if (!quiet)
    cat('calculating...\n')
  Reduce(`+`, lapply(1:batches, function(b) {
    if (!quiet)
      cat('-', b, '/', batches, '\n')
    i = ((b - 1) * batch.size + 1):min(2^n, b * batch.size)
    m = do.call(cbind, lapply(as.integer(2^((1:n) - 1L)), function(j, k) (k %/% j) %% 2L, i))
    x = drop(m %*% votes)
    passed = x >= pass
    colSums((outer(x[passed] - pass, votes, `<`) * m[passed, , drop=F]))
  }))
}

Uses R's name propagation instead of a data.frame, avoid loops where possible, and use integers instead of numerics if possible. Still took over 6 minutes to run on my box:

# wikipedia examples
banzhaf(c(A=4, B=3, C=2, D=1), 6)
banzhaf(c('Hempstead #1'=9, 'Hempstead #2'=9, 'North Hempstead'=7, 'Oyster Bay'=3, 'Glen Cove'=1, 'Long Beach'=1), 16)

# stackoverflow data
system.time(banzhaf(setNames(as.integer(z), x), 255))

The thinking went something like:

2^n possible outcomes (2 outcomes per player, n independent players)
represented by the numbers the 1:2^n (cf 'i')
expressing the number in binary gives each player's vote.
using modulus and division to extract the bits into a voting matrix (cf 'm'), in lieu of bitwise ops (only added to R recently I believe).

After that I think it plays out in the same manner as David's. The only complication was ensuring use of integers for efficiency, and adding the batching as its not really feasible to create a matrix of 27:2^27!

Charles 2010-09-17 19:00:29

Could you say more about what this code is doing, particularly this bit?

David F 2010-09-20 00:47:45

this bit: lapply(as.integer(2^((1:n) - 1L)), function(j, k) (k %/% j) %% 2L, i)

David F 2010-09-20 00:48:17

(answer moved to the body text for formatting)

Charles 2010-09-20 04:52:02

Cool. Good to have this method in mind for extracting bits. Thanks!

David F 2010-09-20 06:38:56

ansaurus

tags:

views:

answers:

Calculating voting power indexes in R

related questions