views:

233

answers:

3

I'm writing a PHP function that would use a table of sorts to look up which DB shard the application should go to, based on the datestamp I have.

The shard configuration is something like this (pseudo-code): the first column is the date of the event I'm looking for and the 2nd is the shard the event resides in.

pre-2008 -> shard1
2008-2009 -> shard2
2009_01-2009_06 -> shard3
2009_07 -> shard4
2009_08 -> shard5
2009_09 and up -> shard6

As you can see, the configuration I want is pretty flexible - it can take any date range, however small or big and map to a shard.

I am looking for the quickest way to do a lookup based on a given date.

For example, if my date is 2009-05-02, the shard I am after is shard3. If the date is 2007-08-01, then it's shard1.

Bonus points for actual PHP code, as the application is in PHP.

Thank you.

A: 

Since your shards can be strictly ordered, it seems like storing them in a binary tree and then simply running a binary search through that tree would give you the fastest results.

Amber
+2  A: 
<?php

function get_shard($datetime)
{
    $timestamp = strtotime($datetime);

    $shards = array(array('start' => null, 'end' => '2007-12-31'),
                    array('start' => '2008-01-01', 'end' => '2008-12-31'),
                    array('start' => '2009-01-01', 'end' => '2009-06-30'),
                    array('start' => '2009-07-01', 'end' => '2009-07-31'),
                    array('start' => '2009-08-01', 'end' => '2009-08-31'),
                    array('start' => '2009-09-01', 'end' => null),
                    );
    foreach ($shards as $key => $range) {
        $start = strtotime($range['start']);
        $end = strtotime($range['end']);
        if ($timestamp >= $start && $timestamp <= $end) {
            return $key + 1;
        }
        if ($timestamp >= $start && $end === false) {
            return $key + 1;
        }
    }
}


$datetime = '2007-08-01';
echo 'shard' . get_shard($datetime) . "\n";

$datetime = '2009-05-02';
echo 'shard' . get_shard($datetime) . "\n";

$datetime = '2010-01-01';
echo 'shard' . get_shard($datetime) . "\n";

?>

Outputs:

shard1
shard3
shard6
inerte
I get the idea but your implementation still suffers from the unoptimized lookups. Additionally, you'd need to rewrite the whole function if you ever decide to change the shard name or split a long shard range into 2 shorter ones, for example.
Artem Russakovskii
+2  A: 

I am guessing that you don't want to have holes in date ranges, so I propose that you just need to specify an end date for each shard, and explicitly name one Default Shard which holds everything that is too new to fit into one of the other shards.

// configure shards
$SHARDS = array(
        // <end date>   => <shard number>
        '2007-12-31'    => 'shard1', // shard1 - up to end of 2007
        '2008-12-31'    => 'shard2', // shard2 - up to end of 2008
        '2009-06-30'    => 'shard3', // shard3 - up to end of June 09
        '2009-07-31'    => 'shard4', // shard4 - up to end of July 2009
        '2009-08-31'    => 'shard5', // shard4 - up to end of August 2009
        'DEFAULT'       => 'shard6', // everything else in shard 6
        );

This makes it easy to get the dates right, and the code for finding a shard based on date is simple:

function findShardByDate($date) {
    static $default = false;
    static $sorted = false;
    if($sorted === false) {
        // copy of global $SHARDS
        $SHARDS = $GLOBALS['SHARDS'];
        $default = $SHARDS['DEFAULT'];
        unset($SHARDS['DEFAULT']);
        // make sure $SHARDS is sorted
        ksort($SHARDS);
        $sorted = $SHARDS;
        unset($SHARDS);
    }
    // find the first shard which would contain that date
    foreach($sorted as $endDate => $shardName)
        if($endDate >= $date)
            return $shardName;
    // no shard found - use the default shard
    return $default;
}

Edit: Used static variables so that sorting is only done once.

too much php
Well, I wouldn't want to sort and lookup all the time, as that's expensive. Otherwise, I like your approach as indeed there are no holes.
Artem Russakovskii
Using start date instead of end date would eliminate the need for 'DEFAULT'.
Todd Owen
Using start date would require a default to catch anything *before* the first shard.
too much php
Optimize it for speed and you got a winner!
Artem Russakovskii