views:

169

answers:

1

I am implementing a roulette wheel selection, and I would like to keep as much code as possible in SQL. My attempt has yielded the query below. $1 is a random variable of the same range as weight I send to the SQL code (it was not clear how to make random() be called only once). Weight is the size of the row's slot on the wheel. random() is a SQLITE function that returns a random number. Here is the query in full:

SELECT id
FROM items
WHERE weight >= $1
ORDER BY random()
LIMIT 1

My question is, is this still the roulette wheel? The basic algorithm would require the summation of all weights, then picking a random value of the range 0..sum -- this would determine which row was selected. Instead this routine first filters all rows that meet one random number, then scrambles their order and picks the first.

One subtle change is the use of $1 instead of a second call to random(). This may make the routine more fair, but I'm not certain it does -- it would mean that each row was given it's own chance to be filtered or not.

So, I guess I am asking how much does this mirror roulette wheel, since it obviously follows different steps. But does it mirror the results?

+1  A: 

One thing I have just thought about is that this is not roulette wheel because of this simple proof by example:

If you had three items, one of each weight one, two and three, then the roulette wheel would choose them with 1/6th, 2/6ths and 3/6ths probability. However, my routine would bias higher weights:

Filter, A  ,   B,   C
  1   , 1/3, 1/3, 1/3
  2   , 0  , 1/2, 1/2
  3   , 0  , 0  , 1

Above you can see that for each of the values of filter ($1 in the question), the items A, B, and C are shown with their respective chances of selection. Adding all of this up would give combined probabilities of A, B and C to be 2/18ths, 5/18ths and 11/18ths.

This is different from roulette wheel, the query in the question seems to bias larger weights. So to answer my own question, the query mirrors roulette wheel, but does not match it.

This leads me to wonder if you picked filter on a particular non-linear distribution, could you still make this query not only mirror but match roulette wheel? And what distribution would it be?

Adam Luter
For the lack of any response, I'll settle with my own findings ;) (thanks for looking though!)
Adam Luter