views:

213

answers:

2

I have a MySQL table that has a row called cur_odds which is a percent number with the percent probability that that row will get selected. How do I make a query that will actually select the rows in approximately that frequency when you run through 100 queries for example?

I tried the following, but a row that has a probability of 0.35 ends up getting selected around 60-70% of the time.

SELECT * FROM table ORDER BY RAND()*cur_odds DESC

All the values of cur_odds in the table add up to 1 exactly.

+1  A: 

Given your above SQL statement, whatever numbers you have in cur_odds are not the probabilities that each row is selected, but is instead just an arbitrary weighting (relative to the "weights" of all the other rows) which could instead be best interpreted as a relative tendency to float towards the top of the sorted table. The actual value in each row is meaningless (e.g. you could have 4 rows with values of 0.35, 0.5, 0.75 and 0.99, or you could have values of 35, 50, 75 and 99, and the results would be the same).

Update: Here's what's going on with your query. You have one row with a cur_odds value of 0.35. For the sake of illustration, I'm going to assume that the other 9 rows all have the same value (0.072). Also for the sake of illustration, let's assume RAND() returns a value from 0.0 to 1.0 (it may actually).

Every time you run this SELECT statement, each row is assigned a sorting value by multiplying its cur_odds value by a RAND() value from 0.0 to 1.0. This means that the row with a 0.35 will have a sorting value between 0.0 and 0.35.

Every other row (with a value of 0.072) will have sorting values ranging between 0.0 and 0.072. This means that there is an approximately 80% chance that your one row will have a sorting value greater than 0.072, which would mean that there is no possible chance that any other row could be sorted higher. This is why your row with the cur_odds value of 0.35 is coming up first more often than you expect.

I incorrectly described the cur_odds value as a relative change weighting. It actually functions as a maximum relative weighting, which would then involve some complex math to determine the actual relative probabilities involved.

I'm not sure what you need can be done with straight T-SQL. I've implemented a weighted probability picker many times (I was even going to ask a question about best methods for this this morning, ironically) but always in code.

MusiGenesis
Actually, I have 10 rows, and the 10 values in cur_odds equal to be 1 exactly.
James Simpson
Try multiplying all the values by 10 (so that they total to 10.0 exactly) and you'll see that you get the same ordering results. Or you could divide them all by 3, or multiply by 100 etc.
MusiGenesis
+1  A: 

If cur_odds is changed rarely you could implement the next algoritm:

1) Create another column prob_sum, for which

prob_sum[0] := cur_odds[0]

for 1 <= i <= row_count - 1:

prob_sum[i] := prob_sum[i - 1] + cur_odds[i]

2) Generate random number less than 1:

rnd := rand(0,1)

3) Find the first row for which prob_sum[i] > rnd

If you will create a BTREE index on prob_sum column, the table "scan" for this condition will be done in order of prob_sum field and should work quite fast:

CREATE INDEX prob_sum_ind ON <table> (prob_sum); --only once

SET @rnd := RAND();

SELECT * FROM <table> WHERE prob_sum > @rnd LIMIT 1;

Vitalii Fedorenko