views:

37

answers:

3

I have a site where people can add their favorite TV shows.
I would like to have some trends statistics. Example:

  1. (1 unchanged) The Big Bang Theory
  2. (3rd last week) How I Met Your Mother
  3. (2nd last week) House
  4. (30th last week, up 400%) Nikita

I'm not sure how to design the database for this, but here is my idea:

  1. Once a week, I run a cronjob.
  2. The cronjob calculate each show's current position.
  3. Last weeks position gets copied over to another db-column.
  4. From these two values (columns), I can calculate the change.

Is this approach fine? How would you do it? :)

PS. I'm a Rails coder, but that should not matter, unless there is some plugins already made for a similar purpose.

A: 

You could add two indexes to the data table:

t_1, t_2

Then a cronjob every week copies t_1 on t_2 and recalculate each t_1

i find it effective because you "pay" only for 2 indexes on the data table but you won't need any join when reading the data.

sathia
That was what I was thinking. For now only two indexes is enough, but what If I wanted to expand later on, and see trends for a month or a full year?
Frexuz
you add more indexes. someone will say that more indexes is bad for a database, it all depends on how many read Vs write you have
sathia
+1  A: 

The MovieVotes table tracks votes for each day. The MovieRating table is a periodic (weekly) snapshot.

One row in the Calendar table is one day.

The CalendarId in the MovieRating table points to the last day of the rating period, in this case WHERE DayInWeek = 7.

The CalendarId in the MovieVotes table points to the current day.

From the MovieRating you can lookup weekly rating and votes. From the MovieVotes you can aggregate votes for an arbitrary period.

alt text

Damir Sudarevic
A: 

Using Damir's model as an example. I'd flip the the order of MovieID and CalID... you'll want to query for different CalendarID for the same move more than the reverse.

His MovieVotes table is already an aggregate by the day. Adding 7 values together for last weeks total is NOT a challenge for a database and makes the MovieRating table as an aggregation unnecessary. If MovieVotes had a datetime column to store the exact time of each vote then using MovieRating as a daily aggregation would be needed... no need to go through thousands of records each time you need to show the total. THAT's where preaggregating shines.

Now if you cluster the data on that PK of MovieID, DateID you're golden. To calculate any date range for any movie your DB will walk the b-tree to get to that movie ID, then walk the rest of the tree to get to your starting date, now you're on the leaf block with the first date and there's a good chance that ALL of your dates are on that block anyways. So you'll do know addition I/O to sum 7 days, just a little more CPU to read the rows out of the block then sum the values.

Stephanie Page