tags:

views:

211

answers:

5

I need to display some random data (1 row) from MySQL table and performance is very important for me. It doesn't necessarily has to be absolutely unique and super random data, so I have two options:

1) Query the database (my table has > 500 000 rows);

2) Automatically once a week create a flat text file (or php file with an array) from db results with, let's say, 400-500 rows and get some random results from it (probably using requre_once or something like that).

Which way is better/faster?

Thank you.

+2  A: 

Query the database.

BUT not by querying

SELECT * FROM tablename ORDER BY rand() LIMIT 1

as this assigns a random number to every row, and then returns the maximum.

Instead, count the number of rows, then get a random number within this value and return that row

$numrows = mysql_num_rows();
$r = rand(0, $numrows-1);
$sql = "SELECT * FROM tablename LIMIT $r, 1";
adam
Note that counting the number of rows is slow with InnoDB.
Alex Brasetvik
Thanks, I wasn't aware of that.
adam
`mysql_num_rows()` returns the number of rows in a result set, not the number of rows in a table.
Bill Karwin
Yep thanks, i was just providing a context for the code.
adam
A: 

Some detailed discussion on the issue: http://www.depesz.com/index.php/2007/09/16/my-thoughts-on-getting-random-row/

Alex Brasetvik
The link only talks about how `ORDER BY RAND()` isn't scalable (which it isn't). Methinks this should've been a comment, not an answer.
OMG Ponies
+1  A: 

Definitely don't use the trick many people use:

SELECT * FROM MyTable ORDER BY RAND() LIMIT 1;

That query looks simple but it is sure to be a performance killer.

This might be a quick solution:

SELECT * FROM MyTable WHERE id > RAND() * (SELECT MAX(*) FROM MyTable) LIMIT 1;

This has some anomalies, such as picking rows that follow gaps more frequently. But you said you want fast, not accurate. Note that aggregates like MAX() and COUNT() are slower when using transactional tables like InnoDB, and faster when using MyISAM.

Bill Karwin
i believe your second query would be even slower than your first, as it is generating a rand and counting the number of values in the table for every single row
adam
@Adam: You'd think so, but in reality the 2nd query is more scalable - see: http://stackoverflow.com/questions/1823306/alerternative-to-mysql-order-by-rand
OMG Ponies
A: 

There's plenty of material on how to efficiently get a random result with your query on this site and Google, so that should take care of your "how to properly write the query" question.

To your second part, if you want the same result for the whole week/day/hour, then you could easily write a cronjob that takes the result, writes it to a file, and have your application pull from that file. The next time the cronjob runs it will overwrite the old version of the file, giving your application the new result. This cronjob could even generate your HTML (or whatever) and put the static page in your web hierarchy, allowing you to benefit from your web server's caching. The web server's caching should negate the disk versus database I/O question, and may end up helping you if your database is usually under heavy load.

Sam Bisbee
A: 

If your table has an incremental unique ID then just

SELECT * FROM table WHERE id = $r

with $r your unique number got by the suggestions above.

Gazzer