ansaurus

Question

How to generate a random partition from an iterator in Python

Answer 1

+1 A:

You're just dealing to various partitions, right?

def dealer( iterator, size ):
    for item in iterator
        yield random.randrange( size ), item

Won't that get you started by assigning each item to a partition?

Then you can do something like this to make lists. Maybe not a good thing, but it shows how to use the function.

def make_lists( iterator, size ):
    the_lists = []*size
    for partition, item in dealer( iterator, size ):
        the_lists[partition].append(item)
    return the_lists

S.Lott 2010-09-21 13:43:46

You could shorten the second part by using groupby, from itertools.

wheaties 2010-09-21 13:46:39

@wheaties: while true, it's not clear what the partitions are being used for.

S.Lott 2010-09-21 13:47:19

Answer 2

+1 A:

You could just create k list. When you receive a value, pick a random integer x between 0 and k-1, and put that value into the x-th list.

On average each list will contain N/k elements, but with a standard deviation of √(N * 1/k * (1-1/k)).

def random_partition(k, iterable):
  results = [[] for i in range(k)]
  for value in iterable:
    x = random.randrange(k)
    results[x].append(value)
  return results

KennyTM 2010-09-21 13:49:37

Can you give a source or explanation for that standard deviation?

Space_C0wb0y 2010-09-22 07:54:10

@Space: This is just a binomial distribution.

KennyTM 2010-09-22 14:26:06

Answer 3

A:

You can make the length of lists more uniform by adjusting the weights depending on the number of nodes generated so far in each partition. They'll be roughly equal-length if you pick a function so that the weight is 0 when (number of nodes in partition n) > (number of nodes)/(number of partions), i.e.

weight[i] = max(numNodes/numPartitions - nodesSoFar[i],0)

(The max() is to stop negative weights, which might happen if you have 4 nodes and 3 partitions.)

Then pick a random number from 1 to sum(weights) (or 0 to sum(weights)-1) and pick the partition appropriately.

compress() works provided you use a different selector per partition; something like (x == n for x in random_partition_numbers) where random_partition_numbers is a generator. You'll need to copy random_partition_numbers for each partition, of course. This design is inherently slower, since it needs to iterate through the list of nodes for each partition.

tc. 2010-09-26 23:53:47

ansaurus

tags:

views:

answers:

How to generate a random partition from an iterator in Python

related questions