tags:

views:

2410

answers:

4

Hi all!

I'd like to optimize my queries so i look into mysql-slow.log.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but i don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or i need to run two or more queries before i can select my PHP-generated random row.

Is there any solution for this issue?

Thanks!

A dummy example:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
ORDER BY
        RAND()
LIMIT 1
+7  A: 

Try this:

SELECT  *
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random r
        WHERE   (@cnt := @cnt - 1)
                AND RAND(20090301) < @lim / @cnt
        ) i

This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

See this article in my blog for more detail:

Update:

If you need to select but a single random record, try this:

SELECT  aco.*
FROM    (
        SELECT  minid + FLOOR((maxid - minid) * RAND()) AS randid
        FROM    (
                SELECT  MAX(ac_id) AS maxid, MIN(ac_id) AS minid
                FROM    accomodation
                ) q
        ) q2
JOIN    accomodation aco
ON      aco.ac_id =
        COALESCE
        (
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_id > randid
                AND ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        ),
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        )
        )

This assumes your ac_id's are distributed more or less evenly.

Quassnoi
Hello, Quassnoi!First of all, thanks for your fast response! Maybe it's my fault but it's still unclear your solution. I'll update my original post with a concrete example and I'll be happy if you explain your solution on this example.
fabrik
there was a typo at"JOIN accomodation acoON aco.id ="where aco.id really is aco.ac_id.on the other hand the corrected query didn't worked for me because it throws an error #1241 - Operand should contain 1 column(s) at the fifth SELECT (the fourth sub-select). I tried to find the problem with parenthesis (if i'm not wrong) but i cannot find the problem yet.
fabrik
`@fabrik` : try now. It would be really helpful if you posted the table scripts so that I could check them before posting.
Quassnoi
Thanks, it works! :) Can you edit the JOIN ... ON aco.id part to JOIN ... ON aco.ac_id so i can accept your solution. Thanks again!A question: i wonder if possible this is a worse random like ORDER BY RAND()? Just because this query repeating some result(s) a lot of times.
fabrik
`@fabrik`: done. As for the randomness: yes, it's less random than `ORDER BY RAND()`. In fact, it selects random `id` between `MIN` and `MAX` , then selects the first `id` greater than that random, with wraparound. If you have a large gap, like `1, 2, 3, 10`, `10` will be selected in `70%` of cases. However, this is fastest way possible. If you want truly random solution, use the first query (just replace `t_random` with your query expressed as an inline view)
Quassnoi
A: 

Here's how I'd do it:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != 'draft'
        AND c.acat_slug != 'vendeglatohely'
        AND a.ac_images != 'b:0;';

SET @sql := CONCAT('
  SELECT  a.ac_id,
        a.ac_status,
        a.ac_name,
        a.ac_status,
        a.ac_images
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != ''draft''
        AND c.acat_slug != ''vendeglatohely''
        AND a.ac_images != ''b:0;''
  LIMIT ', @r, ', 1');

PREPARE stmt1 FROM @sql;

EXECUTE stmt1;
Bill Karwin
See also http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql/213242#213242
Bill Karwin
my table isn't continuous because it's often edited. for example currently the first id is 121.
fabrik
The technique above does not rely on the id values being continuous. It chooses a random number between 1 and COUNT(*), not 1 and MAX(id) like some other solutions.
Bill Karwin
+1  A: 

It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.

However, you should be able to do it in one query using this (for selecting a single value):

SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1

Other solutions:

  • Add a permanent float field called random to the table and fill it with random numbers. You can then generate a random number in PHP and do "SELECT ... WHERE rnd > $random"
  • Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
  • Cache the results of the query as HTML and keep it for a few hours.
DisgruntledGoat
A: 

Dare I ask if the query is actually leveraging indexes? I am not sure how efficient the MySQL RAND method is, but some versions of MySQL are quite fond of generating large result sets on disk and then picking the first one. If you could keep your query isolated to an index, you might have exponentially faster performance. (depending on the index size, performance of RAND, ...)

Can you post an explain plan for the query?

Jacob

TheJacobTaylor
an explain for my query in the first post?
fabrik