tags:

views:

192

answers:

3

Hi, which is more efficient (when managing over 100K records):

A. Mysql

SELECT * FROM user ORDER BY RAND();

of course, after that i would already have all the fields from that record.

B. PHP

use memcached to have $cache_array hold all the data from "SELECT id_user FROM user ORDER BY id_user" for 1 hour or so... and then:

$id = array_rand($cache_array);

of course, after that i have to make a MYSQL call with:

SELECT * FROM user WHERE id_user = $id;

so... which is more efficient? A or B?

+4  A: 

The first one is incredibly slow because

MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.

It's elaborated more on this blog post.

ryeguy
+6  A: 

The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.

Having said that, ORDER BY RAND() is known to be slow because it's impossible for MySQL to use an index. MySQL will basically run the RAND() function once for each row in the table and then sort the rows based on what came back from RAND().

Your other idea of storing all user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you're adding some complexity to your application. I would try to look for another way.

I'll give you a third option which might outperform both your suggestions: Select a count(user_id) of the rows in your user table and then have php generate a random number between 0 and the result of count(user_id) minus 1, inclusive. Then do a SELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.

Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.

Asaph
+1 for the third option
adam
this would work great, except that i forgot to specify that the records that can be selected must be filtered with a WHERE flg_pic=1. If i do a count(id_user) the records that have flg_pic=0 would also be considered :s
andufo
@andufo: This is not really a problem at all. You can do `SELECT COUNT(user_id) FROM user WHERE flg_pic=1;` and then `SELECT * FROM user WHERE flg_pic=1 LIMIT 1 OFFSET random-number-generated-by-php;`.
Asaph
FYI: I realized that the random number must be between `0` and `count(user_id) - 1` inlcusive, and *not* between `1` and `count(user_id)` as I originally stated. I updated my answer. Just wanted to bring that to your attention to avoid a bug.
Asaph
@Asaph wow... these are the results with 150k records: OFFSET took 0.0017 secs and BY RAND() took 1.6337 sec -- Thanks!
andufo
@andufo: Good work doing the benchmark to prove it!
Asaph
A: 
$random_no = mt_rand(0, $total_record_count);

$query = "SELECT * FROM user ORDER BY __KEY__ LIMIT {$random_no}, 1";
tastyone