views:

90

answers:

3

I have a table that describes a variety of objects in my system (ie. umbrella, boots, satchel, whatever). Each one of these objects needs to have a distinct prevalence or incidence. For example, the umbrella is rarer than the boots. Based on those factors, I need to randomly select a single object (including a blank or 'no object found') based on that incidence value.

Yikes. Make sense?

+1  A: 
SELECT * FROM some_table
WHERE (100*RAND()) > some_table.percent_probability
LIMIT 1

....and the probability of selection is stored in the percent_probability field.

C.

symcbean
This sounds workable, but misleading. For example if you have 5 objects, each with a 20 as percent_probability (thinking that each should show up 20% of the time) then one item will be returned 80% of the time, nothing will be returned 20% of the time, and the other 4 items will never be returned. You would have to give the items a probability of 0, 20, 40, 60, and 80 for each to have an equal chance.
Syntax Error
@Syntax Error: yes you're partially right - on reflection the maths is a bit more complex than you suggest, but easily fixed by doing a random order by and moving the filter from where (before the sort) to a having clause (i.e. after).
symcbean
A: 

I'm going to modify symcbean's answer for this, +1 for symcbean.

SELECT * FROM some_table
WHERE (100*RAND()) < some_table.percent_probability

This will return ALL results that match the probability you intuitively want to assign to them. For example, 5 objects with a probability of 20 will all be returned 20% of the time. Objects with a value of 90 will be returned 90% of the time.

So your result will be more than one object, but you've kept the rare ones from showing up as often. So now just grab one of your results at random. An easy way would be to stick them all in an array and:

$items = array(); // assuming you've already filled $items with your 
                  // query results, one item for each array key

$count = count($items);

$chosen_key = rand(1,$count)-1;

$chosen_item = $items[$chosen_key];
Syntax Error
Many thanks: I'm trying this out quickly, and it looks good,but I'm getting an offset error when only one result is returned?"Message: Undefined offset: 1"
Don
Me = dumb. Got it. The rand() function was starting at 1 and therfore missing the first key at [0].
Don
Edited to fix the Undefined offset error. Thanks for pointing that out.
Syntax Error
There is also a problem with this solution. The second random distorts the odds. For example if you have 10 records with 100, 90, 80... odds and the sql random number was 0.01 then it would request the whole dataset... then the php random gives equal odds to all the records. Thus the record with a '10% chance' has a 10% chance from the sql and then because there are 10 records it has a 10% chance from the PHP random thus it becomes 1%.
Pablo
+1  A: 

If you have a write-seldom-read-many scenario (i.e. you change the objects and the probabilities seldom) you might want to pre-calculate the probability values so that if you have a single random value you can unambiguously decide which object to pick (with a single pick, no sorting, no comparison of all records needed).

E.g. (probabilities in per-mill)
umbrella: 500‰ chance
boots: 250‰ chance
satchel: 100‰ chance
whatever: 100‰ chance
"nothing": 50‰ chance

A random number between 0 and 499 means "umbrella" has been picked, 500-749 "boots" and so on.

INSERT INTO foo (name, randmin, randmax) VALUES
  ('umbrella', 0, 499),  
  ('boots', 500, 749),
  ('satchel', 750, 849), 
  ('whatever', 850, 949) 

Every time you add an object or modify the probabilities re-create this table.

Then all you need is a query like

SELECT
  f.name
FROM
  (  
    SELECT Round(Rand()*1000) as r    
  )  as tmp
JOIN
  foo as f  
ON
  r BETWEEN f.randmin and f.randmax  
LIMIT
  1

Only one random value has to be generated and MySQL can use an index on (randmin,randmax) to find the record quickly.

VolkerK