views:

187

answers:

2

I am trying to program a plugin to bbPress (the open source forum software) that will work similar to Hacker News (http://news.ycombinator.com/).

Specifically, I want to sort the order of forum-threads (bbPress calls them "topics") using the following algorithm:

sort_value = (p - 1) / (t + 2)^1.5
where p = total votes for each topic from users
t = time since submission of each topic in hours

I would like to be able to sort topics by this calculated sort_value using MySQL.

The relevant fields in the topics table looks something like this:

topic_id            bigint(20)
topic_start_time    datetime

This is up in the air, but I was thinking there will be another table that stores individual votes by users so we'll be able to know whether a user has voted already. And another table will store the current vote-totals for each topic. Maybe there will be another field in that table storing the latest calculated sort_Value?

To be 100% accurate, the sort_value should be updated after each new vote. This would add too much load to the database server, though, especially if we tried to update ALL the topics. If we have to, we could limit the dataset by only calculating the sort_value for the last X # of topics. We could also limit the load by only updating the sort_value periodically (e.g. every 5 minutes via a cron job).

These shortcuts might make the load acceptable, but I would prefer a more elegant solution that could scale better.

How would you structure this? :-)

A: 

OK, this is my idea. I'll start by creating an old_table that has X rows of topics with a sort_value field.

I want to avoid tons of UPDATE statements on a single table, so I'll periodically replace the old table with a freshly calculated table. As far as I'm aware, MySQL does not support a "replace table" syntax, so every Y minutes, via cron, I'll create an updated version of this table called new_sort_value. Then I'll do this sequence of commands:

  • DROP old_table
  • RENAME new_table to old_table

Does this seem like a valid approach?

bobbyh
I thinks that's valid if a little clumsy. Unfortunately your are dealing with the constraints of the system you are adding to. Scaling this sort of problem is exactly the sort of thing rdbms databases don't do well at.Something like a CouchDB View would be right up this alley.
Jeremy Wall
Thanks, Jeremy. I'll check out CouchDB.I just thought of another tweak to this idea, which is to just save (elsewhere) a value that says which `table` is active. Say that current value is `old_table`. This would tell my app to do a JOIN against `old_table`. Then, after creating an updated `new_table`, I'd UPDATE the "Active database" value to `new_table`.This would avoid a DROP of a table that is being requested for regular JOINs.
bobbyh
+1  A: 

There are a number of tradeoffs to consider in this. You've hinted at them already in your question. Timeliness and Exactness vs Load and Scale.

Batching the calculations is the best way to decrease Load and increase Scale if Timeliness and Exactness aren't necessary and the system experiences a high load of writes.

You really have to sort of examine the the useage of the system and determine what areas you need to optimize for. Optimizing for Write has different constraints than optimizing for Reads. Same for timeliness or Exactness of the data.

Determine which ones are most important for your application and make the appropriate tradeoff.

Jeremy Wall