tags:

views:

92

answers:

4

Hey Everyone,

I currently have a listing of data that has a randomly generated order listing. Right now there is no pagination, so it is easy for me to generate a randomly ordered list. As my list of data grows, it gets slower because of all the data, so the obvious solution is pagination. The issue that I have with pagination is that I cannot randomly generate the order every time the page loads, so my manager and I have come to the conclusion that a list will have to be pre-generated ahead of time and will be re-generated every x amount of time. Now the issue is how do we store this generated list? There are four options that we've come up with:

  1. Session (takes up ram on the server)
  2. Cookies (more data is transmitted. Think about thousands of integer values transmitted to the user)
  3. Flat File (Implementation might take a bit of time. Not an extreme amount, but a little longer than the rest)
  4. database (a cron job will run ever x amount of time and do a mass update on all records. Our worry is that if there is too much data, it might slow down the system if people are hitting the server during an update.)

If there are any other solutions which seem better than pre-generated time-based lists, I'd love to hear about them.

Thanks.

UPDATE: An answer that I really liked, but was deleted for some reason, was that someone mentioned the use of a SEED, then I can store the seed instead of a list of ids, which would cut-down my data storage and simplify everything. I just tested the solution, and it works almost flawlessly. The only problem is that when I use LIMIT, everything will screw up. Does anyone have an suggestions to that? I don't want to have to generate all the data every time, I just want to use LIMIT * , . If I use this with a seed though, the numbers always reset, as it should.

Hopefully that made sense. It made more sense as I was thinking about it than how it turned out typed.

A: 

Use #4, possibly only storing the IDs of the data to retrieve, in the order they should be retrieved.

Better than that, if possible (since this is similar to a scaling issue) is to pre-generate each page's data. For example, if it's only viewed through a browser, just go ahead and pre-generate x-number of static pages HTML (or just the table/list portion).

I know that this probably sounds ridiculous without further explanation, but think about one of these options.

Also, if the server is taking that hard of a hit during generation of the results, it needs to be separated from the web server, and possibly the report/generation run on a clone/replicated slave of the original database.

anonymous coward
I like to lean more towards #4 as well because if we use flat-files, we will take the same performance hit as a cron job, but the flat-file guarantees that every time the pre-generated list is generated, a user will feel that performance hit, whereas a cron job will do it in the background, and for a user to feel that performance hit, they'd have to enter the site at that exact moment that database is being updated.
JohnathanKong
I'm not sure how often you have to "update" the generated listing, but is it entirely not an option to have it generated from a replicated/slave database of some sort, so that the "real" database takes no hit at all, and therefore the users *never* notice? Is the bottleneck in CPU/Random-Ordering, or elsewhere? Do you know?
anonymous coward
A: 

Mysql RAND() accepts a seed as an optional argument. Using a seed, it will return the same randomized result set each time.

What you could do is generate a random seed in PHP on the first page request and pass it along to each page using a query string.

Edit: Sorry, I didn't realize the solution was posted already but was deleted.

Kevin
Yup, but the issue with this solution, which I love, is that I can't limit a static number of results, so as the user goes down the pages, poor efficiency starts to kick in. An example is if I do SELECT RAND() FROM my_table LIMIT 1, 5, I will get the same results as SELECT RAND() FROM my_table LIMIT 6, 5, which would be quite random.
JohnathanKong
Not sure, seems to be working when I tested it. Are you saying the queries work fine, but are not efficient?
Kevin
What I meant was that I was having issues with select sections using the LIMIT. But I think I've come to a solution at this point. My solution is using the id and a seed to generate a unique randomly calculated number. This will allow me to pick from the middle. I will do more testing, and if this works, this random seed will be the solution.
JohnathanKong
I could be missing something very obvious here, but is this not the desired result? http://i.imgur.com/5qDSA.png
Kevin
yup, that is the exact required result. Unfortunately according to geocar (poster below), my MySQL might have a bug. I don't like updating live servers unless I have to, so unfortunately I'll have to use this workaround for now until I can find a good enough excuse to update the MySQL servers.
JohnathanKong
A: 

Use a subselect. This way you can still use the RAND()-trick, but OFFSET won't mess you up.

select * from (select * from items order by rand(3) asc) as b limit 5 offset @x;

Alternate hint: Upgrade your MySQL. This was an old bug.


Example:

mysql> select * from (select id from items order by rand(3) asc) as b limit 5 offset 1;
+-------+
| id    |
+-------+
| 24621 | 
| 25214 | 
| 27119 | 
| 24672 | 
| 25585 | 
+-------+
5 rows in set (0.01 sec)

mysql> select * from (select id from items order by rand(3) asc) as b limit 5 offset 2;
+-------+
| id    |
+-------+
| 25214 | 
| 27119 | 
| 24672 | 
| 25585 | 
| 27718 | 
+-------+
5 rows in set (0.01 sec)

mysql> select * from (select id from items order by rand(3) asc) as b limit 5 offset 3;
+-------+
| id    |
+-------+
| 27119 | 
| 24672 | 
| 25585 | 
| 27718 | 
| 25457 | 
+-------+
5 rows in set (0.00 sec)

mysql> select * from (select id from items order by rand(3) asc) as b limit 5 offset 4;
+-------+
| id    |
+-------+
| 24672 | 
| 25585 | 
| 27718 | 
| 25457 | 
| 27112 | 
+-------+
5 rows in set (0.01 sec)

mysql> select * from (select id from items order by rand(3) asc) as b limit 5 offset 5;
+-------+
| id    |
+-------+
| 25585 | 
| 27718 | 
| 25457 | 
| 27112 | 
| 24779 | 
+-------+
5 rows in set (0.02 sec)
geocar
RAND() seems to ignore the offset, which makes sense, because we are offsetting x number of rows, so no matter what the first generated row will be the first one because that's when the generate function is called.
JohnathanKong
JohnathanKong: I just posted an example to the contrary.
geocar
A: 

I prefer Random file, watch this class of caching taked from opencart:

<?php
final class Cache {
  private $expire = 3600; 
  public function __construct() {
    $files = glob(DIR_CACHE . 'cache.*');
    if ($files) {
      foreach ($files as $file) {
        $time = substr(strrchr($file, '.'), 1);
        if ($time < time()) { unlink($file); }
      }
    }
  }

  public function get($key) {
    $files = glob(DIR_CACHE . 'cache.' . $key . '.*');
    if ($files) {
      foreach ($files as $file) {
        $handle = fopen($file, 'r');
        $cache = fread($handle, filesize($file));
        fclose($handle);
        return unserialize($cache);
      }
    }
  }

  public function set($key, $value) {
    $this->delete($key);
    $file = DIR_CACHE . 'cache.' . $key . '.' . (time() + $this->expire);
    $handle = fopen($file, 'w');
    fwrite($handle, serialize($value));
    fclose($handle);
  }

  public function delete($key) {
    $files = glob(DIR_CACHE . 'cache.' . $key . '.*');
    if ($files) {
      foreach ($files as $file) {
        unlink($file);
      }
    }
  }
}
?>

is really simple to use and it works so well, you use the random query and save your data into the file, I post an example.

$cache = new cache();
$data = $cache->get('my_query_key');
if (!$data) {
  // I do my query and I put it into an array (because I can use shuffle :P)
  $result = mysql_query('SELECT * FROM items');
  $data = array();
  while($row = mysql_fetch_assoc($result)) { $data[] = $row; }
  $cache->set('my_query_key', $data);
}
shuffle($data);

The only things that there is a problem when saving more than 100kb of a file, but as rumor, I use it and works very well never get me no problem. Ah.. in this case isn't needed to use RAND() on query. :P

I have write this post without checking sintax, be ware ^^

Paper-bat