views:

243

answers:

3

I have mysql database structure like below:

CREATE TABLE test (
    id int(11) NOT NULL auto_increment,
    title text NULL,
    tags text NULL,
    PRIMARY KEY (id)
);

data on field tags is stored as a comma separated text like html,php,mysql,website,html etc... now I need create an array that contains around 50 randomly selected tags from random records.

currently I am using rand() to select 15 random mysql data from database and then holding all the tags from 15 records in an array. Then I am using array_rand() for randomizing the array and selecting only 50 random tags.

$query=mysql_query("select * from test order by id asc, RAND() limit 15");
$tags="";
while ($eachData=mysql_fetch_array($query)) {
    $additionalTags=$eachData['tags'];
    if ($tags=="") {
        $tags.=$additionalTags;
    } else {
        $tags.=$tags.",".$additionalTags;
    }
}

$tags=explode(",", $tags);
$newTags=array();
foreach ($tags as $tag) {
    $tag=trim($tag);
    if ($tag!="") {
        if (!in_array($tag, $newTags)) {
            $newTags[]=$tag;
        }
    }
}

$random_newTags=array_rand($newTags, 50);

Now I have huge records on the database, and because of that; rand() is performing very slow and sometimes it doesn't work. So can anyone let me know how to handle this situation correctly so that my page will work normally.

+2  A: 

Never ORDER BY RAND() - it's horrible for performance. Instead do the randomizing in PHP. Something like this, since your ID is auto incrementing (may not be the best approach):

$count = mysql_fetch_assoc(mysql_query("select count(1) as count from test"));
$range = range(0, $count['count']);

$selection = array_rand($range, 50);
$sel_list = implode(',', $selection);

$query = mysql_query("select * from test where id in ($sel_list)");

By the way, why are you putting your tags in a string list, only to explode that string later? Just put them into an array from the start.

Tesserex
This assumes no gaps in the `auto_increment` column. If you have gaps, you could select the records individually. It will still be fast because you'll be using the primary key index for each query. `"SELECT * FROM test WHERE id >= $selection LIMIT 1"`. Alternatively, if you don't get 15 rows back, just get some more.
Marcus Adams
Prakash
great, it's working like a charm with little bit modification to fit my page additional requirement.
Prakash
A: 

I think that you're probably aware of why ORDER BY RAND() is slow. The query reads all the records, then orders them without the aid of an index.

If you select a random number between 0 and MAX(id) - 15 and get the next 15 rows, will it be sufficiently random? Are the records entered in an unordered way?

SELECT * FROM test
WHERE id >= my_random_value
ORDER BY id
LIMIT 15
Marcus Adams
First I need to select 15 random records from the table. Then holding the data of field "tags" in an array, I need to print 50 Random tags. I think your example will not work because sometimes the returned value for my_random_value may not exist on the database.
Prakash
I was going to take this approach, but then I figured having 15 entries with sequential ID wasn't random enough.
Tesserex
@Prakash, please note the `>=` in the `WHERE` clause. If you select a random value between (inclusive) the smallest ID value and the largest id value (minus 15), then select the next 15 records where the id is greater than or equal to the random value, you'll always get at least one record, and most of the time, you'll get 15 records. If you get less than 15, then repeat until you have 15.
Marcus Adams
A: 

Marcus procedure can be improved, for example

SELECT * FROM test
WHERE id % round(rand()*(SELECT count(*) FROM test)) = 0
ORDER BY id
LIMIT 15

(and it is not too slow). The only problem is that the above expression does not guarantee that you will get 15 records. Would you like something like this? It could be improved to guarantee 15 records.

Unreason
this sounds good but is not returning more records. Sometimes it returned empty and sometimes just few rows and because of it I am unable to get 50 tags at final. Is there anyway to get more rows?
Prakash
yes there is, by lowering the divisor - the SELECT count(*) FROM test; and putting in a smaller value there you will have a higher chance of selecting the record, but still no guarantees.
Unreason