views:

73

answers:

2
mysql> EXPLAIN SELECT * FROM urls ORDER BY RAND() LIMIT 1;
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
|  1 | SIMPLE      | urls  | ALL  | NULL          | NULL | NULL    | NULL | 62228 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-------+---------------------------------+

The above doesn't qualify as efficient,how should I do it properly?

UPDATE

Seems using the solution mentioned in the answer still doesn't help:

mysql> explain SELECT  *
    -> FROM    (
    ->         SELECT  @cnt := COUNT(*) + 1,
    ->                 @lim := 10
    ->         FROM    urls
    ->         ) vars
    -> STRAIGHT_JOIN
    ->         (
    ->         SELECT  r.*,
    ->                 @lim := @lim - 1
    ->         FROM    urls r
    ->         WHERE   (@cnt := @cnt - 1)
    ->                 AND RAND(20090301) < @lim / @cnt
    ->         ) i;
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
| id | select_type | table      | type   | possible_keys | key  | key_len | ref  | rows  | Extra                        |
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL | NULL    | NULL |     1 |                              |
|  1 | PRIMARY     | <derived3> | ALL    | NULL          | NULL | NULL    | NULL |    10 |                              |
|  3 | DERIVED     | r          | ALL    | NULL          | NULL | NULL    | NULL | 62228 | Using where                  |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL | NULL    | NULL |  NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+------+---------+------+-------+------------------------------+
+3  A: 

Quassnoi has written a post about selecting rows at random without performing a sort. His example selects 10 rows at random, but you can adapt it to select just one row.

If you want it to be really fast then you can use an approximation that won't be completely uniform or will sometimes fail to return a row.

You can also use a stored procedure to select a random row quickly from Bill Karwin's post:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

Note that this will run much faster in MyISAM than InnoDB because COUNT(*) is expensive in InnoDB but nearly instant in MyISAM.

Mark Byers
Seems it's not working in my case.
@user198729: Looking at your code above you forgot to change `lim` from 10 to 1. Maybe I didn't make that clear enough.
Mark Byers
It doesn't matter,still the same after changing 10 to 1.Pay attention to the 3rd line of explain output.
@user198729: How many rows do you have? What is your storage engine (InnoDB or MyISAM)? What is the running time of the query in seconds?
Mark Byers
It's `MyISAM`,with `62229` rows.
@user198729: What's the running time of the query in seconds?
Mark Byers
It's 0.17 secs,with the original `order by rand()` 0.13 secs
@user198729: 0.17s is pretty fast given that it uses a table scan. Note that the first method is O(n log(n)) and the second is O(n). The lack of difference in the results is probably because you don't have any data in your database so both are very fast. I think you should notice a significant difference if you put some test data into your database. At 1 million rows you should notice a 10x speed up. How many rows do you plan to have?
Mark Byers
So O(n) is the best possible performance already? I was expecting some O(log(n)) ones. As for how many rows do I plan to have,there's no limit,it grows with the time.
@user198729: If your PK is almost sequential with few deletions you can choose a random r between 1 and max(id) and find the first row that has `id >= r`. This will be close to instant but introduces a bias.
Mark Byers
@user198729: Or since you are using MyISAM you can count the number of rows, choose a random number `r` between `0` and `COUNT(*) - 1` and use a `LIMIT r, 1`.
Mark Byers
Oh I can't assume the sequential.Seems I have to go with O(n) anyway.But I don't understand what `RAND(20090301)` in @Quassnoi's solution means?
@user198729: That number is a seed to make the example reproducible. Don't include the seed in your real system.
Mark Byers
Oops,still an issue: `RAND(20090301) < @lim / @cnt`,this condition might fail all rows,right?
@user198729: No, it will succeed for exactly @lim rows. The value of @lim / @cnt changes for each row (because the values of @lim and @cnt change).
Mark Byers
I think `RAND(20090301) < @lim / @cnt` should been changed to `RAND(20090301) <= @lim / @cnt`(`<` -> `<=`) to ensure it always returns exactly @lim rows. Though practically it may never happen. What do you think?
@user198729: See the documentation for RAND: http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand `Returns a random floating-point value v in the range 0 <= v < 1.0.` Note that it is `< 1.0` not `<= 1.0`. Exactly 1.0 can never be returned by `RAND()`.
Mark Byers
Oh then it's correct:) In fact I found this solution more efficient and without bias,but only apply for random **1** record: http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql/213242#213242
Simply put, the `order by` is not necessary to select a random row.
@user198729: OK, that was actually one of my suggestions above: `since you are using MyISAM you can count the number of rows, choose a random number r between 0 and COUNT(*) - 1 and use a LIMIT r, 1`. I've put it into the main post now so that people searching will find it more quickly.
Mark Byers
A: 

Well, If you can move some logic to application layer (and I didn't misunderstood your question), then all you need is to generate random ID in your application and then perform simple select for one record identified by that key. All you need to know is count of records. Oh, and if that key was deleted, get next one.

michajas