tags:

views:

173

answers:

3

I am looking for the best way to retrieve the next and previous records of a record without running a full query. I have a fully implemented solution in place, and would like to know whether there are any better approaches to do this out there.

Let's say we are building a web site for a fictitious greengrocer. In addition to his HTML pages, every week, he wants to publish a list of special offers on his site. He wants those offers to reside in an actual database table, and users have to be able to sort the offers in three ways.

Every item also has to have a detail page with more, textual information on the offer and "previous" and "next" buttons. The "previous" and "next" buttons need to point to the neighboring entries depending on the sorting the user had chosen for the list.

alt text

Obviously, the "next" button for "Tomatoes, Class I" has to be "Apples, class 1" in the first example, "Pears, class I" in the second, and none in the third.

The task in the detail view is to determine the next and previous items without running a query every time, with the sort order of the list as the only available information (Let's say we get that through a GET parameter ?sort=offeroftheweek_price, and ignore the security implications).

Obviously, simply passing the IDs of the next and previous elements as a parameter is the first solution that comes to mind. After all, we already know the ID's at this point. But, this is not an option here - it would work in this simplified example, but not in many of my real world use cases.

My current approach in my CMS is using something I have named "sorting cache". When a list is loaded, I store the item positions in records in a table named sortingcache.

name (VARCHAR)             items (TEXT)

offeroftheweek_unsorted    Lettuce; Tomatoes; Apples I; Apples II; Pears
offeroftheweek_price       Tomatoes;Pears;Apples I; Apples II; Lettuce
offeroftheweek_class_asc   Apples II;Lettuce;Apples;Pears;Tomatoes

obviously, the items column is really populated with numeric IDs.

In the detail page, I now access the appropriate sortingcache record, fetch the items column, explode it, search for the current item ID, and return the previous and next neighbour.

array("current"   => "Tomatoes",
      "next"      => "Pears",
      "previous"  => null
      );

This is obviously expensive, works for a limited number of records only and creates redundant data, but let's assume that in the real world, the query to create the lists is very expensive (it is), running it in every detail view is out of the question, and some caching is needed.

My questions:

  • Do you think this is a good practice to find out the neighbouring records for varying query orders?

  • Do you know better practices in terms of performance and simplicity? Do you know something that makes this completely obsolete?

  • In CS, is there a name for this problem?

  • Is the name "Sorting cache" is appropriate and understandable for this technique?

  • Are there any recognized, common patterns to solve this problem? What are they called?

Note: My question is not about building the list, or how to display the detail view. Those are just examples. My question is the basic functionality of determining the neighbors of a record when a re-query is impossible, and the fastest and cheapest way to get there.

If something is unclear, please leave a comment and I will clarify.

A: 

So you have two tasks:

  1. build sorted list of items (SELECTs with different ORDER BY)
  2. show details about each item (SELECT details from database with possible caching).

What is the problem?

PS: if ordered list may be too big you just need PAGER functionality implemented. There could be different implementations, e.g. you may wish to add "LIMIT 5" into query and provide "Show next 5" button. When this button is pressed, condition like "WHERE price < 0.89 LIMIT 5" is added.

noonex
As I said, neither the building of the list, nor the displaying of details are my issue. My question is about the specific way of caching I outlined for getting the neighboring records, and whether anybody has better ideas on how to do that.
Pekka
+2  A: 

Here is an idea. You could offload the expensive operations to an update when the grocer inserts/updates new offers rather than when the end user selects the data to view. This may seem like a non-dynamic way to handle the sort data, but it may increase speed. And, as we know, there is always a trade off between performance and other coding factors.

Create a table to hold next and previous for each offer and each sort option. (Alternatively, you could store this in the offer table if you will always have three sort options -- query speed is a good reason to denormalize your database)

So you would have these columns:

  • Sort Type (Unsorted, Price, Class and Price Desc)
  • Offer ID
  • Prev ID
  • Next ID

When the detail information for the offer detail page is queried from the database, the NextID and PrevID would be part of the results. So you would only need one query for each detail page.

Each time an offer is inserted, updated or deleted, you would need to run a process which validates the integrity/accuracy of the sorttype table.

Jessica
This idea is very interesting, and makes the concept scalable to larger lists. It would require additional "janitorial" work (removing references to deleted items in the chain etc.) but that could be handled when the data changes. Very nice, I'll think about this!
Pekka
And welcome to SO.
Pekka
Thanks for the welcome. Glad you liked my idea.
Jessica
+1  A: 

Maybe a lightweight implementation of an indexed, doubly linked list helps you:

class dll_node {
  public $id;
  public $data;
  public $prev;
  public $next;

  function __construct($node_id, $node_data) {
    $this->id   = $node_id;
    $this->data = $node_data;
    $this->prev = null;
    $this->next = null;
  }
}

class dll_index {
  private $list;
  public $head;
  public $tail;

  function __construct() {
    $this->list = array();
    $this->head = null;
    $this->tail = null;
  }
  function append($id, $data = null) {
    if (array_key_exists($id, $this->list)) 
      return false;

    $node = new dll_node($id, $data);
    if ($this->tail === null) {
      $this->head = &$node;
    } else {
      $node->prev = &$this->tail;
      $this->tail->next = &$node;
    }
    $this->tail = &$node;
    $this->list[$id] = &$node;
    return true;
  }
  function get($id) {
    if (array_key_exists($id, $this->list))
      return $this->list[$id];
    return false;
  }
}

$sorted = new dll_index();
$sorted->append("a");
$sorted->append("b");
$sorted->append("c");

// get items by ID
$node = $sorted->get("b");
if ($node) {
  echo $node->prev->id."\n";  // "a"
  echo $node->next->id."\n";  // "c"
}
// iterate the list
$node = $sorted->head;
while ($node) {
  echo $node->id."\n";
  $node = $node->next;
}

It is implemented to be used in a write-once, forward-only manner (append(), no delete()), convenient for database query results. Put in ID values and optionally payload data. When you retrieve data by ID, you have instant access to the preceding and following element.

Cache the lists like you currently do, and that's it. It should have nice lookup performance, and it has nice usage semantics, too.

Tomalak