views:

311

answers:

4

I'm sick of waiting for my developers to fix this, so I decided to ask you guys. I'm trying to modify what they've already given me, so sorry if the setup doesn't make a whole lot of sense. I could probably change it but want to keep as much of the existing table setup as possible.

I've got a MySQL table with a bunch of entries in it, and a column called "Multiplier." The default (and most common) value for this column is 0, but it could be any number.

What I need to do is select a single entry from that table at random. However, the rows are weighted according to the number in the "Multiplier" column. A value of 0 means that it's not weighted at all. A value of 1 means that it's weighted twice as much, as if the entry were in the table twice. A value of 2 means that it's weighted three times as much, as if the entry were in the table three times.

I've been trying to figure out how to do this with SELECT and RAND(), but don't know how to do the weighting. Is it possible?

I would appreciate any help.

A: 

Whatever you do, it is giong to be terrible because it will involve: * Getting the total "weights" for all columns as ONE number (including applying the multiplier). * Getting a random number between 0 and that total. * Getting all entries and runing them along, deducting the weight from the random number and choosing the one entry when you run out of items.

In average you will run along half the table. Performance - unless the table is small, then do it outside mySQL in memory - will be SLOW.

TomTom
A: 

Well, I would put the logic of weights in PHP:

<?php
    $weight_array = array(0, 1, 1, 2, 2, 2);
    $multiplier = $weight_array[array_rand($weight_array)];
?>

and the query:

SELECT *
FROM `table`
WHERE Multiplier = $multiplier
ORDER BY RAND()
LIMIT 1

I think it will work :)

Silver Light
Interesting!The possible value for multiplier could theoretically be anything, but will probably go as high as 20. Wouldn't that make the array huge? Is that OK?
John
Well, you can make $weight_array dynamic, so that you dont have to type all the numbers by hand. Don't worry about resources - a thousand of int's is not much.
Silver Light
@John, then create the weight array dynamically with a for loop, by putting a 2nd for loop inside
TravisO
I'm not sure that this code do what I want it to do: Let's say I have 100 entries in the table: 98 have a multiplier of 0, 1 has a multiplier of 1 (counts as 2 entries), and 1 has a multiplier of 2 (counts as 3 entries). The chance of a 0-multiplier entry being chosen should be 98/103, of a 1-multiplier entry should be 2/103, and of a 2-multiplier entry should be 3/103. However, with your code the chances would be 1/6, 2/6, 3/6. Maybe I need to put every entry's ID into an array, with weighted entried entered multiple times, and then use array_rand?
John
+2  A: 

Don't use 0, 1 and 2 but 1, 2 and 3. Then you can use this value as a multiplier:

SELECT * FROM tablename ORDER BY (RAND() * Multiplier);
Frank Heikens
or just add 1: SELECT * FROM tablename ORDER BY (RAND() * (Multiplier+1));
Silver Light
I thought of doing something like this, but I don't see how multiplying a random number by another number results in anything getting weighted.Also, how does it know which entry to take the multiplier value from?
John
@John: RAND() gives you a random number between 0 and 1. A bigger multiplier gives you bigger chance to end up with the biggest result. Sorting on this result makes sense. Do some tests with a large dataset and see the results.
Frank Heikens
add "limit 1" to the end of the select as well to just get a single row. This is not an efficient solution though, it'll be slow on big tables.
Peter N Lewis
A: 

The result of the pseudo-code (rand(1, num) % rand(1, num)) will get more toward 0 and less toward num. Subtract the result from num to get the opposite.

So if my application language is PHP, it should look something like this:

$arr = mysql_fetch_array(mysql_query(
    'SELECT MAX(`Multiplier`) AS `max_mul` FROM tbl'
));
$MaxMul = $arr['max_mul']; // Holds the maximum value of the Multiplier column

$mul = $MaxMul - ( rand(1, $MaxMul) % rand(1, $MaxMul) );

mysql_query("SELECT * FROM tbl WHERE Multiplier=$mul ORDER BY RAND() LIMIT 1");

Explanation of the code above:

  1. Fetch the highest value in the Multiplier column
  2. calculate a random Multiplier value (weighted toward the maximum value in the Multiplier column)
  3. Fetch a random row which has that Multiplier value

It's also achievable merely by using MySQL.

Proving that the pseudo-code (rand(1, num) % rand(1, num)) will weight toward 0: Execute the following PHP code to see why (in this example, 16 is the highest number):

$v = array();

for($i=1; $i<=16; ++$i)
    for($k=1; $k<=16; ++$k)
        isset($v[$i % $k]) ? ++$v[$i % $k] : ($v[$i % $k] = 1);

foreach($v as $num => $times)
        echo '<div style="margin-left:', $times  ,'px">
              times: ',$times,' @ num = ', $num ,'</div>';
Dor
I'm racking my brain trying to understand what this code is doing, but I see some stuff there that I haven't seen before. Could you explain it in layman's terms?
John
Yes :) I've edited my post with explanation for the PHP code.
Dor
Looks good, but the majority of entries will have a multiplier of 0 and it doesn't look like this code will ever select them.
John
I can't see why not... You can assign to $mul the value of `( rand(1, $MaxMul) % rand(1, $MaxMul) )`
Dor