tags:

views:

2376

answers:

12

What is a fast way to select a random row from a large mysql table?

I'm working in php, but I'm interested in any solution even if it's in another language.

+3  A: 

An easy but slow way would be (good for smallish tables)

SELECT * from TABLE order by RAND() LIMIT 1
Vinko Vrsalovic
This will produce a random value for all the rows in the table, a sort, and then grabbing one row. This is not quick.
Lasse V. Karlsen
True. It's quick in development time though. (and in answer time :-) ). I'll leave it here for non big table users who might need it
Vinko Vrsalovic
+8  A: 

Grab all the id's, pick a random one from it, and retrieve the full row.

If you know the id's are sequential without holes, you can just grab the max and calculate a random id.

If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.

If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.

Don't do that, nor should you order by a GUID, it has the same problem.

Lasse V. Karlsen
A: 

With a order yo will do a full scan table. Its best if you do a select count(*) and later get a random row=rownum between 0 and the last registry

MazarD
+1  A: 

In pseudo code:

sql "select id from table"
store result in list
n = random(size of list)
sql "select * from table where id=" + list[n]

This assumes that id is a unique (primary) key.

Anders Sandvig
If the IDs don't change frequently you can even keep the list of IDs in memory to make things faster.
Anders Sandvig
What if there are a billion rows? That means your list variable is huge.
Bill Karwin
+8  A: 

I knew there had to be a way to do it in a single query in a fast way. And here it is:

A fast way without involvement of external code, kudos to

http://jan.kneschke.de/projects/mysql/order-by-rand/

SELECT name
  FROM random AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1;
Vinko Vrsalovic
Note the tradeoff here in that, to be assured of getting a result on the first try, any keys which are preceded by gaps will be more likely to be selected. e.g., Given two records with keys 1 and 10, the record with 10 as its key will be selected 90% of the time.
Dave Sherohman
Yes, you can get a better distribution if the keys are without gaps and avoiding the WHERE and ORDER BY clauses. Check the article, it's all pretty well explained there. I didn't want to steal all of it, thus didn't put the other queries, pros and cons of each.
Vinko Vrsalovic
+2  A: 

Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
Bill Karwin
+10  A: 

MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)

This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).

CesarB
A: 

Quick and dirty method:

SET @COUNTER=SELECT COUNT(*) FROM your_table;

SELECT PrimaryKey FROM your_table LIMIT 1 OFFSET (RAND() * @COUNTER);

The complexity of the first query is O(1) for MyISAM tables.

The second query accompanies a table full scan. Complexity = O(n)

Dirty and quick method:

Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.

CREATE TABLE Aux(MyPK INT AUTO_INCREMENT, PrimaryKey INT);

SET @MaxPK = (SELECT MAX(MyPK) FROM Aux);

SET @RandPK = CAST(RANDOM() * @MaxPK, INT)

SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = @RandPK);

If DELETEs are allowed,

SET @delta = CAST(@RandPK/10, INT);

SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK BETWEEN @RandPK - @delta AND @RandPK + @delta LIMIT 1;

The overall complexity is O(1).

yogman
A: 

Hi there.

I'm a bit new to SQL but how about generating a random number in PHP and using

SELECT * FROM the_table WHERE primary_key >= $randNr

this doesn't solve the problem with holes in the table.

but here's a twist on lassevks suggestion:

SELECT primary_key FROM the_table

use mysql_num_rows() in PHP create a random number based on the above result.

SELECT * FROM the_table WHERE primary_key = rand_number

on a side note just how slow is SELECT * FROM the_table

creating a random number based on mysql_num_rows()

and then moving the data pointer to that point mysql_data_seek()

Just how slow will this be on large tables with say a million rows?

A: 

For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:

SELECT * FROM words AS r1 JOIN (SELECT MAX(WordID) as wid_c FROM words) as tmp1 WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n

A: 

There is another way to produce random rows using only a query and without order by rand(). It involves User Defined Variables. See how to produce random rows from a table

Ilan Hazan
A: 

Hola Mundo, un ejemplo para hacer RAND de los contenidos de una tabla en mysql y que es optimo para cuando tenemos muchos registros.

English: Hello World, an example to RAND for the contents of a table in mysql and that is optimal for when we have many records.

 1. SELECT 
 2. RAND() as aleatorio,
 3. producto.* 
 4. FROM producto
 5. WHERE
 6. producto.id < (SELECT
    ABS(FLOOR(MAX(producto.id)/RAND((CAST(SECOND(NOW())AS
    UNSIGNED)+1)/RAND()))) FROM producto
    LIMIT 1)
 7. ORDER BY aleatorio
 8. LIMIT 3

Saludos y espero haber aportado mi granitos de arena. ;-)

English: Greetings and I hope I have contributed my bit. ;-)

Navegante