ansaurus

Question

How to select an item from a list with known percentages in Python

Answer 1

A:

One solution is to normalize the probabilities to integers and then repeat each element once per value (e.g. a list with 2 Oranges, 1 Apple, 3 Mangos). This is incredibly easy to do (from random import choice). If that is not practical, try the code here.

Brian 2009-10-12 18:55:22

Answer 2

A:

import random
d= {'orange': 0.10, 'mango': 0.15, 'apple': 0.05}
weightedArray = []
for k in d:
  weightedArray+=[k]*int(d[k]*100)
random.choice(weightedArray)

EDITS

This is essentially what Brian said above.

Mark 2009-10-12 18:59:24

this is very inefficient in general, and likely inaccurate as soon as the probabilities aren't nice fractions.

Peter 2009-10-12 19:15:19

Answer 3

+1 A:

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    tup = (x, fruit)
    lst2.append(tup)
    x += chance

tup = (x, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][1] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for x, fruit in reversed(lst2):
            if x <= r:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)

This builds a new list, with ascending values representing the range of values that choose a fruit; then pick_one() walks backward down the list, looking for a value that is <= the current random value. We put a "sentinel" value on the end of the list; if the values don't reach 1.0, there is a chance of a random value that shouldn't match anything, and it will match the sentinel value and then be rejected. random.random() returns a random value in the range [0.0, 1.0) so it is certain to match something in the list eventually.

The nice thing here is that you should be able to have one value with a 0.000001 chance of matching, and it should actually match with that frequency; the other solutions, where you make a list with the items repeated and just use random.choice() to choose one, would require a list with a million items in it to handle this case.

steveha 2009-10-12 19:05:42

It might be helpful to add some comments to describe what this code is doing, and why it works. Essentially, each item is given a "probability range", defined by its value in `lst2[x][0]` as the upper bound, and `lst2[x-1][0]` as its lower bound. Each element in the list is checked against the output from `random.random()`. One question: why the `while True` loop?

dcrosta 2009-10-12 19:18:51

The `while True:` loop is essential to handle the case where `fruit is None`. The problem stated that the chances might not add up to 100%, and my sample data only adds up to 0.99, so there is a 1% chance of matching the sentinel value. We break out of the loop over lst2, and try again in the `while True:` loop, getting a new random value.

steveha 2009-10-12 19:22:17

Now bearing in mind I am only 2 days into python or any programming (i.e this is the stupid disclaimer) does this start at the end of the list and test whether that fruit is the one to pick and if it isn't move onto the next smallest (or rather testing right from left).

MDA1973 2009-10-12 19:24:37

pick_one() looks at the values in lst2, using a `for` loop. The loop will look at the values in the list in order. However, I put `reversed()` around the list, which means the `for` loop will look at the list in reversed order. In my example, 'Apple' has a 0.05 chance of being chosen; the fruit after 'Apple', 'Mango', has a value of 0.15 and 'Apple' itself has a value of 0.10. Thus values less than 0.15 and greater than or equal to 0.10 should match 'Apple': we want 0.10 <= r < 0.15 So by walking backward through the list, comparing values for <=, we get this range matching behavior.

steveha 2009-10-12 19:30:50

I posted another answer, somewhat simpler than this one but doing basically the same thing. It saves explicit ranges, rather than making ascending values serve as ranges.

steveha 2009-10-12 19:40:18

Answer 4

+2 A:

You can pick items with weighted probabilities if you assign each item a number range proportional to its probability, pick a random number between zero and the sum of the ranges and find what item matches it. The following class does exactly that:

from random import random

class WeightedChoice(object):
    def __init__(self, weights):
        """Pick items with weighted probabilities.

            weights
                a sequence of tuples of item and it's weight.
        """
        self._total_weight = 0.
        self._item_levels = []
        for item, weight in weights:
            self._total_weight += weight
            self._item_levels.append((self._total_weight, item))

    def pick(self):
        pick = self._total_weight * random()
        for level, item in self._item_levels:
            if level >= pick:
                return item

You can then load the CSV file with the csv module and feed it to the WeightedChoice class:

import csv

weighed_items = [(item,float(weight)) for item,weight in csv.reader(open('file.csv'))]
picker = WeightedChoice(weighed_items)
print(picker.pick())

Ants Aasma 2009-10-12 19:22:28

Thank you very much this works exactly as I envisioned it.

MDA1973 2009-10-12 19:54:36

Using `bisect` in `pick()` method will increase it's speed for long lists (O(log n) instead of O(n)).

Denis Otkidach 2009-10-13 04:03:12

That depends on the distribution of input data. Sorting the weights largest first will reduce the amount of checks. If the weights are heavily skewed then a linear search might have a better average case than bisecting from the middle.

Ants Aasma 2009-10-13 09:31:10

Answer 5

A:

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    low = x
    high = x + chance
    tup = (low, high, fruit)
    lst2.append(tup)
    x += chance

if x > 1.0:
    raise ValueError, "chances add up to more than 100%"

low = x
high = 1.0
tup = (low, high, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][2] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for low, high, fruit in lst2:
            if low <= r < high:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)


# test it 10,000 times
d = {}
for i in xrange(10000):
    x = pick_one(lst2)
    if x in d:
        d[x] += 1
    else:
        d[x] = 1

I think this is a little clearer. Instead of a tricky way of representing ranges as ascending values, we just keep ranges. Because we are testing ranges, we can simply walk forward through the lst2 values; no need to use reversed().

steveha 2009-10-12 19:39:23

Thank you I think one thing your solution has highlighted to me which I don't think I had appreciated to now is the necessity to ensure that errors and things I don't expect to happen are covered. This is something I haven't really thought of yet and feel like it is something I should look into more as I learn Python.

MDA1973 2009-10-12 19:57:04

Answer 6

+1 A:

What you want is to draw from a multinomial distribution. Assuming you have two lists of items and probabilities, and the probabilities sum to 1 (if not, just add some default value to cover the extra):

def choose(items,chances):
    import random
    p = chances[0]
    x = random.random()
    i = 0
    while x > p :
     i = i + 1
     p = p + chances[i]
    return items[i]

job 2009-10-12 19:43:30

+1. Clear, simple code.

Kylotan 2009-10-13 10:09:07

ansaurus

tags:

views:

answers:

How to select an item from a list with known percentages in Python

related questions