views:

109

answers:

2

The setup: High traffic website and a list of image URLs that we want to display. We have one image spot, and each item in the set of image URLs has a target display percentage for the day. Example:

  • Image1 - 10%
  • Image2 - 30%
  • Image3 - 60%

Because the traffic amount can vary from day to day, I'm doing the percentages within blocks of 1000. The images also need to be picked randomly, but still fit the distribution accurately.

Question: I've implemented POC code for doing this in memcache, but I'm uncomfortable with the way data is stored (multiple hash keys mapped by a "master record" with meta data). This also needs to be able to fall back to a database if the memcache servers go down. I'm also concerned about concurrency issues for the master record.

Is there a simpler way to accomplish this? Perhaps a fast mysql query or a better way to bring memcache into this?

Thanks

A: 

A hit to the database is most likely going to take longer so I would stick with memcache. You are going to have more issues with concurrency using MySQL than memcache. memcache is better equipped to handle a lot of requests and if the servers go down, this is going to be the least of your worries on a high traffic website.

Maybe a MySQL expert can pipe in here with a good query structure if you give us more specifics.

David Weitz
+1  A: 

You could do what you said, pregenerate a block of 1000 values pointing at the images you'll return:

$distribution = "011022201111202102100120 ..." # exactly evenly distributed

Then store that block in MySQL and memcache, and use another key (in both MySQL and memcache) to hold the current index value for the above string. Whenever the image script is hit increment the value in memcache. If memcache goes down, go to MySQL instead (UPDATE, then SELECT; there may be a better way to do this part).

To keep memcache and MySQL in sync you could have a cron job copy the current index value from memcache to MySQL. You'll lose some accuracy but that may not be critical in this situation.

You could store multiple distributions in both MySQL and memcache and have another key that points to the currently active distribution. That way you can pregenerate future image blocks. When the index exceeds the distribution the script would increment the key and go to the next one.

Roughly:

function FetchImageFname( )
{
  $images = array( 0 => 'image1.jpg', 1 => 'image2.jpg', 2 => 'image3.jpg' );
  $distribution = FetchDistribution( );
  $currentindex = FetchCurrentIndex( );

  $x = 0;
  while( $distribution[$currentindex] == '' && $x < 10 );
  {
    IncrementCurrentDistribKey( );
    $distribution = FetchDistribution( );
    $currentindex = FetchCurrentIndex( );
    $x++;
  }

  if( $distribution[$currentindex] == '' )
  {
    // XXX Tried and failed. Send error to central logs.
    return( $images[0] );
  }

  return( $distribution[$currentindex] );
}

function FetchDistribution( )
{
  $current_distib_key = FetchCurrentDistribKey( );
  $distribution = FetchFromMemcache( $current_distrib_key );
  if( !$distribution )
    $distribution = FetchFromMySQL( $current_distrib_key );
  return $distribution;
}

function FetchCurrentIndex( )
{
  $current_index = MemcacheIncrement( 'foo' );
  if( $current_index === false )
    $current_index = MySQLIncrement( 'foo' );
  return $current_index;
}

.. etc. The function names kind of stink, but I think you'll get the idea. When the memcache server is back up again, you can copy the data from MySQL back to memcache and it is instantly reactivated.

dpk