What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
An easy but slow way would be (good for smallish tables)
SELECT * from TABLE order by RAND() LIMIT 1
Grab all the id's, pick a random one from it, and retrieve the full row.
If you know the id's are sequential without holes, you can just grab the max and calculate a random id.
If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.
If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.
Don't do that, nor should you order by a GUID, it has the same problem.
With a order yo will do a full scan table. Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry
In pseudo code:
sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]
This assumes that id
is a unique (primary) key.
I knew there had to be a way to do it in a single query in a fast way. And here it is:
A fast way without involvement of external code, kudos to
http://jan.kneschke.de/projects/mysql/order-by-rand/
SELECT name
FROM random AS r1 JOIN
(SELECT (RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1;
Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).
SET @COUNTER=SELECT COUNT(*) FROM your_table;
SELECT PrimaryKey FROM your_table LIMIT 1 OFFSET (RAND() * @COUNTER);
The complexity of the first query is O(1) for MyISAM tables.
The second query accompanies a table full scan. Complexity = O(n)
Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.
CREATE TABLE Aux(MyPK INT AUTO_INCREMENT, PrimaryKey INT);
SET @MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET @RandPK = CAST(RANDOM() * @MaxPK, INT)
SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = @RandPK);
If DELETEs are allowed,
SET @delta = CAST(@RandPK/10, INT);
SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK BETWEEN @RandPK - @delta AND @RandPK + @delta LIMIT 1;
The overall complexity is O(1).
Hi there.
I'm a bit new to SQL but how about generating a random number in PHP and using
SELECT * FROM the_table WHERE primary_key >= $randNr
this doesn't solve the problem with holes in the table.
but here's a twist on lassevks suggestion:
SELECT primary_key FROM the_table
use mysql_num_rows() in PHP create a random number based on the above result.
SELECT * FROM the_table WHERE primary_key = rand_number
on a side note just how slow is SELECT * FROM the_table
creating a random number based on mysql_num_rows()
and then moving the data pointer to that point mysql_data_seek()
Just how slow will this be on large tables with say a million rows?
There is another way to produce random rows using only a query and without order by rand(). It involves User Defined Variables. See how to produce random rows from a table
Hola Mundo, un ejemplo para hacer RAND de los contenidos de una tabla en mysql y que es optimo para cuando tenemos muchos registros.
English: Hello World, an example to RAND for the contents of a table in mysql and that is optimal for when we have many records.
1. SELECT
2. RAND() as aleatorio,
3. producto.*
4. FROM producto
5. WHERE
6. producto.id < (SELECT
ABS(FLOOR(MAX(producto.id)/RAND((CAST(SECOND(NOW())AS
UNSIGNED)+1)/RAND()))) FROM producto
LIMIT 1)
7. ORDER BY aleatorio
8. LIMIT 3
Saludos y espero haber aportado mi granitos de arena. ;-)
English: Greetings and I hope I have contributed my bit. ;-)