views:

398

answers:

3

I've been reading a lot about the disadvantages of using "order by rand" so I don't need update on that. I was thinking, since I only need a limited amount of rows retrieved from the db to be randomized, maybe I should do:

$r = $db->query("select * from table limit 500");
for($i;$i<500;$i++)
$arr[$i]=mysqli_fetch_assoc($r);
shuffle($arr);

(i know this only randomizes the 500 first rows, be it).

would that be faster than

$r = $db->("select * from table order by rand() limit 500");

let me just mention, say the db tables were packed with more than...10,000 rows.

why don't you do it yourself?!? - well, i have, but i'm looking for your experienced opinion.

thanks!

A: 

I am looking at this from experience with MySQL.

Let's talk about the first piece of code:

$r = $db->query("select * from table");
for($i=0;$i<500;$i++){
  $arr[$i] = mysqli_fetch_assoc($r);
}
shuffle($arr);

Clearly it would be more efficient to LIMIT the number of rows in the SQL statement instead of doing it on PHP.

Thus:

$r = $db->query("SELECT * FROM table LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
shuffle($arr);

SQL operation would be faster than doing it in PHP, especially when you have such large amount of rows. One good way to find out is to do benchmarking and find out which of the two would be faster. My bet is that the SQL would be faster than shuffling in PHP.

So my vote goes for:

$r = $db->query("SELECT * FROM table ORDER BY RAND() LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
thephpdeveloper
RAND() isn't scalable.. once you hit rows by the thousands, it slows down dramatically.
Tor Valamo
thank you! and I saw that mistake you mentioned and corrected it.
sombe
however between shuffle and RAND(), RAND() definitely fare better than shuffle..
thephpdeveloper
Well, shuffling is O(*n*), sorting is O(*n* log *n*). I'd say shuffle would actually be faster.
Joey
A: 

I'm pretty sure the shuffle takes longer in your case, but you may wanna see this link for examples on fast random sets from the database. It requires a bit of extra SQL, but if speed is important to you, then do this.

http://devzone.zend.com/article/4571-Fetching-multiple-random-rows-from-a-database

Tor Valamo
+1  A: 

500 or 10K, the sample size is too small to be able to draw tangible conclusions. At 100K, you're still looking at the 1/2 second region on this graph. If you're still concerned with performance, look at the two options for a randomized number I provided in this answer.

We don't have your data or setup, so it's left to you to actually test the situation. There are numerous pages for how to calculate elapsed time in PHP - create two pages, one using shuffle and the other using the RAND() query. Run at least 10 of each, & take a look.

OMG Ponies
thank you, I've actually read that thread a couple of times before posting this question.
sombe