views:

475

answers:

9

I currently have a query that ends ORDER BY RAND(HOUR(NOW())) LIMIT 40 to get 40 random results. The list of results changes each hour.

This kills the query cache, which is damaging performance.

Can you suggest an alternative way of getting a random(ish) set of results that changes from time to time? It does not have to be every hour and it does not have to be totally random.

I would prefer a random result, rather than sorting on an arbitrary field in the table, but I will do that as a last resort...

(this is a list of new products that I want to shuffle around a bit every now and then).

+4  A: 

Its going to kill the cache because you are expecting a different result set each time. There is no way that you can cache a random set of values. If you want to cache a group of results, cache a large random set of values, and then within sub sections of the time that you are going to use those values do a random grab within the smaller set [outside of sql].

monksy
Well, I only want it to change every hour or so, so during the hour it will be fixed. This is what my current query produces, with the downside of using rand() that prevents the results from being cached.
rikh
A: 

One way to achieve it is to shuffle the objects you map the data to. If you don't map the data to objects, you could shuffle the result array from the database. I don't know if this will perform better or not, but you will at least get the benefits from the query cache as you mention.

You could also generate a random sequence from 1 to n, and index the result array (or object array) with those.

Yngve Sneen Lindal
A: 

you may have a column with random values that you update every hour.

Xavier Combelle
+3  A: 

If you have an ID column it's better to do a:

-- create a variable to hold the random number
SET @rownum := SELECT count(*) FROM table;
SET @row := (SELECT CEIL((rand() * @rownum));

-- use the random number to select on the id column
SELECT * from tablle WHERE id = @row;

The logic of selecting the random id number can be move to the application level.

SELECT * FROM table ORDER BY RAND LIMIT 40

is very inefficient because MySQL will process ALL the records in the table performing a full table scan on all the rows, order them randomly.

Yada
A: 

calculate the current hour in your PHP code and pass that to your query. this will result in a static value that can be cached.

note that you might also have a hidden bug. since you're only taking the hour, you only have 24 different values, which will repeat every day. which means that what's showing at 1 pm today will also be the same as what shows tomorrow at 6. you might want to change that.

longneck
the hour is just being used as a seed for the random number generator. Yes, I know I get the same results at 2pm each day, but that is fine (unless the list of products changes in any way)
rikh
A: 

Don't fight with the cache-- expoit it!

Write your query as you are (or even simpler). Then, in your code, cache the results, setting a cache expiry for 1 hour. If you are using a caching layer, like memcached, you are set. If not, you can build a fairly simple one:

[pseudocode]
global cache[24]
h = Time.hour
if (cache[h] == null) {
  cache[h] = .. run your query
}
return cache[h];
ndp
+2  A: 

I think the better way is to download product identifiers to your middle layer, choose random 40 values when you need (once per hour or for every request) and use them in the query: product_id in (@id_1, @id_2, ..., @id_40).

alygin
+1 This is often a good solution, Unless @rikh is running Amazon or eBay (i.e., millions of products). Having the IDs in memory might be useful for other optimizations too.
Seth
A: 

If you only need a new set of random data once an hour, don't hit the database - save the results to your application's caching layer (or, if it doesn't have one, just put it out into a temporary file of some sort). Query cache is handy, but if you never need to even execute a query, even better...

ceejayoz
+1  A: 

This is going to be a significantly nasty query if it needs to sort a large data set into a random order (which really does require a sort), then discard all but the first 40 records.

A better solution would be to just pick 40 random records. There are lots of ways of doing this and it usually depends on having keys which are evenly distributed.

Another option is to pick the 40 random records in a batch job which is only run once per hour (or whatever) and then remember which ones they are.

MarkR