tags:

views:

82

answers:

2

We have a PHP website like reddit, users can vote for the stories.

We tried to use APC, memcached etc. for the website but we gave up. The problem is we want to use a caching mechanism, but users can vote anytime on site and the cached data may be old and confusing for the other visitors.

Let me explain with an example, We have an array of 100 stories and stored in cache for 5 mins., a user voted for some stories so the ratings of the stories are changed. When the other user enter the website, he/she will see the cached data, therefore the old data. (This is the same if the voter user refreshes the page, he'll also see the old vote number for the stories.)

We cannot figure it out, any help will be highly appreciated

+2  A: 

Perhaps this is a solution you've already considered, but why not just cache everything but the ratings? Instead, just update a single array, where the ith position contains the rating for the ith top story. Keep this in memory all the time, and flush ratings back to the database as it's available.

If you only care about the top N stories being the most up-to-date, then i only needs to be the size of the number of stories on the front page, which is presumably a very small number like 50 or so.

John Feminella
Yep definitely we thought about that. Actually this is the only solution we got so far. I opened this question, maybe there is a better solution for our problem. Could you suggest the best way to keep an array in memory for PHP, BTW thanks a lot for your response
alicia
+5  A: 

This is a matter of finding a balance between low-latency updates, and overall system/network load (aka, performance vs. cost).

  1. If you have capacity to spare, the simplest solution is to keep your votes in a database, and always look them up during a page load. Of course, there's no caching here.

  2. Another low-latency (but high-cost) solution is to have a pub-sub type system that publishes votes to all other caches on the fly. In addition to the high cost, there are various synchronization issues you'll need to deal with here.

  3. The next alternative is to have a shared cache (e.g., memcached, but shared across different machines). Updates to the database will always update the cache. This reduces the load on the database and would get you lower latency responses (since cache lookups are usually cheaper than queries to a relational database). But if you do this, you'll need to size the cache carefully, and have enough redundancy such that the shared cache isn't a single point of failure.

  4. Another, more commonly used, alternative is to have some kind of background vote aggregation, where votes are only stored as transactions on each of the front-end servers, and you have a background process that continuously (e.g., every five seconds) aggregates the votes and populates all the caches.

AFAIK, reddit does not do live low-latency vote propagation. If you vote something up, it isn't immediately reflected across other clients. My guess is that they're doing some kind of aggregation (as in #4), but that's just me speculating.

0xfe