ansaurus

Question

Answer 1

+4 A:

Its going to kill the cache because you are expecting a different result set each time. There is no way that you can cache a random set of values. If you want to cache a group of results, cache a large random set of values, and then within sub sections of the time that you are going to use those values do a random grab within the smaller set [outside of sql].

monksy 2009-12-08 16:30:16

Well, I only want it to change every hour or so, so during the hour it will be fixed. This is what my current query produces, with the downside of using rand() that prevents the results from being cached.

rikh 2009-12-08 16:52:43

Answer 2

A:

One way to achieve it is to shuffle the objects you map the data to. If you don't map the data to objects, you could shuffle the result array from the database. I don't know if this will perform better or not, but you will at least get the benefits from the query cache as you mention.

You could also generate a random sequence from 1 to n, and index the result array (or object array) with those.

Yngve Sneen Lindal 2009-12-08 16:31:03

Answer 3

A:

you may have a column with random values that you update every hour.

Xavier Combelle 2009-12-08 16:31:48

Answer 4

+3 A:

If you have an ID column it's better to do a:

-- create a variable to hold the random number
SET @rownum := SELECT count(*) FROM table;
SET @row := (SELECT CEIL((rand() * @rownum));

-- use the random number to select on the id column
SELECT * from tablle WHERE id = @row;

The logic of selecting the random id number can be move to the application level.

SELECT * FROM table ORDER BY RAND LIMIT 40

is very inefficient because MySQL will process ALL the records in the table performing a full table scan on all the rows, order them randomly.

Yada 2009-12-08 16:32:42

Answer 5

A:

calculate the current hour in your PHP code and pass that to your query. this will result in a static value that can be cached.

note that you might also have a hidden bug. since you're only taking the hour, you only have 24 different values, which will repeat every day. which means that what's showing at 1 pm today will also be the same as what shows tomorrow at 6. you might want to change that.

longneck 2009-12-08 16:33:17

the hour is just being used as a seed for the random number generator. Yes, I know I get the same results at 2pm each day, but that is fine (unless the list of products changes in any way)

rikh 2009-12-08 16:55:27

Answer 6

A:

Don't fight with the cache-- expoit it!

Write your query as you are (or even simpler). Then, in your code, cache the results, setting a cache expiry for 1 hour. If you are using a caching layer, like memcached, you are set. If not, you can build a fairly simple one:

[pseudocode]
global cache[24]
h = Time.hour
if (cache[h] == null) {
  cache[h] = .. run your query
}
return cache[h];

ndp 2009-12-08 16:57:49

Answer 7

+2 A:

I think the better way is to download product identifiers to your middle layer, choose random 40 values when you need (once per hour or for every request) and use them in the query: product_id in (@id_1, @id_2, ..., @id_40).

alygin 2009-12-08 17:23:06

+1 This is often a good solution, Unless @rikh is running Amazon or eBay (i.e., millions of products). Having the IDs in memory might be useful for other optimizations too.

Seth 2009-12-08 17:34:44

Answer 8

A:

If you only need a new set of random data once an hour, don't hit the database - save the results to your application's caching layer (or, if it doesn't have one, just put it out into a temporary file of some sort). Query cache is handy, but if you never need to even execute a query, even better...

ceejayoz 2009-12-08 17:30:48

Answer 9

+1 A:

This is going to be a significantly nasty query if it needs to sort a large data set into a random order (which really does require a sort), then discard all but the first 40 records.

A better solution would be to just pick 40 random records. There are lots of ways of doing this and it usually depends on having keys which are evenly distributed.

Another option is to pick the 40 random records in a batch job which is only run once per hour (or whatever) and then remember which ones they are.

MarkR 2009-12-10 11:59:02

ansaurus

tags:

views:

answers:

ORDER BY RAND() alternative

related questions