ansaurus

Question

Answer 1

+3 A:

Take the list of and find the cumulative total of the weights: 70, 70+20, 70+20+10. Pick a random number greater than or equal to zero and less than the total. Iterate over the items and return the first value for which the cumulative sum of the weights is greater than this random number:

def select( values ):
    variate = random.random() * sum( values.values() )
    cumulative = 0.0
    for item, weight in values.items():
        cumulative += weight
        if variate < cumulative:
            return item
    return item # Shouldn't get here, but just in case of rounding...

print select( { "a": 70, "b": 20, "c": 10 } )

This solution, as implemented, should also be able to handle fractional weights and weights that add up to any number so long as they're all non-negative.

Boojum 2010-09-07 02:58:08

When I first saw this answer it didn't have any code in it. Looks like we were busy coming up with essentially the same code at the same time.

Mark Ransom 2010-09-07 03:34:31

Answer 2

+1 A:

I think you can have an array of small objects (I implemented in Java although I know a little bit C# but I am afraid can write wrong code), so you may need to port it yourself. The code in C# will be much smaller with struct, var but I hope you get the idea

class PercentString {
  double percent;
  String value;
  // Constructor for 2 values
}

ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");

double percent = 0;
for (int i = 0; i < list.size(); i++) {
  PercentString p = list.get(i);
  percent += p.percent;
  if (random < percent) {
    return p.value;
  }
}

vodkhang 2010-09-07 03:02:03

Sorry for misunderstanding the requirement, I changed my code

vodkhang 2010-09-07 03:14:29

Answer 3

A:

Let T = the sum of all item weights
Let R = a random number between 0 and T
Iterate the item list subtracting each item weight from R and return the item that causes the result to become <= 0.

ChrisH 2010-09-07 03:09:41

Answer 4

+4 A:

For Python:

>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
... 
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a a a a c c a c a a c a
>>>

General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.

Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).

Alex Martelli 2010-09-07 03:10:21

Hm, I’m not sure about the performance characteristics of creating a list of hundreds of entries when only three are required.

Timwi 2010-09-07 03:23:49

This works OK (but not optimal) when the percentages are all integers, but what if they're arbitrary real numbers? There are better solutions.

Mark Ransom 2010-09-07 03:31:38

@Timwi, have you _measured_? The list being created once, and then many random numbers being generated from it, you might be surprised how well this performs. @Mark, I did say this is not optimal if you are given floats so incredibly precise that you need to match many digits of them in your expected probability distribution (not a sensible spec, mind you, but then, whoever's specifying and paying for the code is not always a sensible person, especially when they're paying with other people's money...;-). The OP says "percentages" and those are often rounded to the nearest percent, ya'know?

Alex Martelli 2010-09-07 04:03:10

@Alex, you're absolutely right, this does meet the spec. It's also very easy to understand once you decipher the `picks` generator. I just find it hard to recommend a limited solution when a more general one is nearly as simple.

Mark Ransom 2010-09-07 04:17:45

@Mark, my code, when made into a function, is actually simpler than yours -- and _performance_ is potentially quite a bit better, when the conditions are met. The "picks generator" (which isn't one -- it's a list comprehension) can of course easily be refactored into a loop -- it's a _preliminary_ anyway (not executed on _every_ call, only on those where the desired probabilities _change_) so the listcomp's or loop's performance is probably going to be amortized away in any normal, useful, sensible case.

Alex Martelli 2010-09-07 04:28:58

@Alex, you've convinced me. And sorry for my imprecise terminology.

Mark Ransom 2010-09-07 14:16:25

@Mark, absolutely NP -- great to see you're such a reasonable person as to care about the discussion, tx!

Alex Martelli 2010-09-07 15:09:38

Answer 5

+3 A:

Here is a complete solution in C#:

public class ProportionValue<T>
{
    public double Proportion { get; set; }
    public T Value { get; set; }
}

public static class ProportionValue
{
    public static ProportionValue<T> Create<T>(double proportion, T value)
    {
        return new ProportionValue<T> { Proportion = proportion, Value = value };
    }

    static Random random = new Random();
    public static T ChooseByRandom<T>(
        this IEnumerable<ProportionValue<T>> collection)
    {
        var rnd = random.NextDouble();
        foreach (var item in collection)
        {
            if (rnd < item.Proportion)
                return item.Value;
            rnd -= item.Proportion;
        }
        throw new InvalidOperationException(
            "The proportions in the collection do not add up to 1.");
    }
}

Usage:

var list = new[] {
    ProportionValue.Create(0.7, "a"),
    ProportionValue.Create(0.2, "b"),
    ProportionValue.Create(0.1, "c")
};

// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());

Timwi 2010-09-07 03:22:32

Answer 6

+2 A:

def weighted_choice(probabilities):
    random_position = random.random() * sum(probabilities)
    current_position = 0.0
    for i, p in enumerate(probabilities):
        current_position += p
        if random_position < current_position:
            return i
    return None

Because random.random will always return < 1.0, the final return should never be reached.

Mark Ransom 2010-09-07 03:24:47

Answer 7

+1 A:

import random

def selector(weights):
    i=random.random()*sum(x for x,y in weights)
    for w,v in weights:
        if w>=i:
            break
        i-=w
    return v

weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]

it works equally well for fractional weights

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]

If you have a lot of weights, you can use bisect to reduce the number of iterations required

import random
import bisect

def make_acc_weights(weights):
    acc=0
    acc_weights = []
    for w,v in weights:
        acc+=w
        acc_weights.append((acc,v))
    return acc_weights

def selector(acc_weights):
    i=random.random()*sum(x for x,y in weights)
    return weights[bisect.bisect(acc_weights, (i,))][1]

weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

Also works fine for fractional weights

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

gnibbler 2010-09-07 03:34:10

Answer 8

+3 A:

Knuth references Walker's method of aliases. Searching on this, I find http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ and http://prxq.wordpress.com/2006/04/17/the-alias-method/. This gives the exact probabilities required in constant time per number generated with linear time for setup (curiously, n log n time for setup if you use exactly the method Knuth describes, which does a preparatory sort you can avoid).

mcdowella 2010-09-07 04:26:52

ansaurus

tags:

views:

answers:

selection based on percentage weighting

related questions