ansaurus

Question

How to add "weights" to a MySQL table and select random values according to these?

Answer 1

+1 A:

The easiest (and maybe best/safest?) way to do this is to add those rows to the table as many times as you want the weight to be - say I want "Tree" to be found 2x more often then "Dog" - I insert it 2 times into the table and I insert "Dog" once and just select elements at random one by one.

If the rows are complex/big then it would be best to create a separate table (weighted_Elements or something) in which you'll just have foreign keys to the real rows inserted as many times as the weights dictate.

RnR 2009-11-30 11:54:18

+1 I also prefer to count in unary.

jleedev 2009-11-30 12:10:48

It tends to lead to the most efficient approach too doesn't it? ;) (at least considering the current options) Especially when selections are more frequent then insertions/re-weights (which is probably a typical scenario). It's a single query instead of a linear loop through all the elements each time and it will help avoid a nasty complexity problem when trying to get X random elements with weights Y times a second... :)

RnR 2009-11-30 20:08:48

Actually, if you want to get X random elements, you just have to sort by writing a clever comparison function (although I don't know how easy that is in SQL). http://code.google.com/p/quodlibet/source/browse/quodlibet/quodlibet/browsers/search.py#117

jleedev 2009-12-01 05:11:27

But this one is definitely more natural in SQL and more efficient for this problem.

jleedev 2009-12-01 05:12:08

Answer 2

+2 A:

I found this nice little algorithm in Quod Libet. You could probably translate it to some procedural SQL.

function WeightedShuffle(list of items with weights):
  max_score ← the sum of every item’s weight
  choice ← random number in the range [0, max_score)
  current ← 0
  for each item (i, weight) in items:  
    current ← current + weight  
    if current ≥ choice or i is the last item:  
      return item i

jleedev 2009-11-30 11:55:04

+1 for where you found it, I just love Quod Libet.

AttishOculus 2009-11-30 12:00:41

Answer 3

+1 A:

The best possible scenario (if i understand your question properly) is to setup your table as you normally would and then add two columns both INT's.

Column 1: Weight - This column would hold your weight value going from -X to +X, X being the highest value you want to have as a weight (IE: X=100, -100 to 100). This value is populated to give the row an actual weight and increase or decrease the probability of it coming up.
Column 2: *Count** - This column would hold the count of how many times this row has come up, this column is needed only if you want to use fair weighting. Fair weighting prevents one row from always showing up. (IE: if you have one row weighted at 100 and another at 2 the row with 100 will always show up, this column will allow weight 2 to be more 'valueable' as you get more weight 100 results). This column should be incremented by 1 each time a row result is pulled but you can make the logic more advanced later so it adds the weight etc.
Logic: - Its really simple now, your query simply has to request all rows as you normally would then make an extra select that (you can change the logic here to whatever you want) takes the weights and subtracts the count and order by that column.

The end result should be a table where you will get your weights appearing more often until a certain point where the system will evenly distribute itself out (leave out column 2) and you will have a system that will always return the same weighted order unless you offset the base of the query (IE: LIMIT [RANDOM NUMBER], [NUMBER OF ROWS TO RETURN])

Zyris Development Team 2009-11-30 11:58:52

Answer 4

+1 A:

I'm not an expert in probability theory, but assuming you have a column called WEIGHT, how about

select FIELD_1, ... FIELD_N, (rand() * WEIGHT) as SCORE
  from YOURTABLE
 order by SCORE
 limit 0, 10

This would give you 10 records, but you can change the limit clause, of course.

Tom Bartel 2009-12-01 07:53:59

Downvote. It is not difficult to see that it fails to generate the right results with weights (1,2,2,2,2,2) (in particular, the chance of selecting the row with weight 1 is 1/32 if it happens to roll rand()=1, which is pretty unlikely anyway).

tc. 2010-08-06 17:08:42

ansaurus

tags:

views:

answers:

How to add "weights" to a MySQL table and select random values according to these?

related questions