views:

317

answers:

3

Hello all!

I got a question regarding my MySQL-database and would like to get input on what would be most efficient.

My problem is as follows,

I'm developing premium functionality for my board game web site. One premium functionality would be that all the games a user has played would be stored "forever" (for the user to look up afterwards). For normal users games older than 18 months are deleted.

Now I need to figure out an effective way to delete the games (which is more than 18 months old) for normal non premium users and keep games for premium users.

Simplifying things I got two tables (in reality there's one more table which stores the game participants for each game):

Games,

 id=INT
 play_date=DATETIME
 end_score=INT
 player_id_1=INT
 player_id_2=INT

Users,

 id=INT
 premium=BOOLEAN (true=enabled, false=not enabled)

The user table contains 300.000+ rows while the Games table contain a few million rows. Each day approx 20.000 games are added to the Games table.

What would be the most efficient way to remove games older than 18 months from NON-premium users.

So far we've removed games older than 18 months for ALL users each Monday morning.

Now I need to take premium-value and game date into account.

A few solutions(?):

  • JOIN'ing the tables, altough we're talking million of rows in the Games table, this would be a no-no?
  • Get each game-entry older than 18 months, then get each users entry from player_id_1 & player_id_2 and if ANYONE of these are premium, let the game be, else delete it if it's older than 18 months. So for one week this could be 20k*7=140k worth of games.
  • Above solution except I do it every hour. Then there's approx 1000 games I need to get and check.
  • ?? add some kind of helper variable to the Games table? But what if a user stops using premium....

Any tips welcome...

+1  A: 

Download a dump of the database and do some benchmarking on your computer. This should give you a pretty good idea of how the various solutions performe

Apart from that it's the usual reply for those sql query questions: Run "explain" on your queries and make sure that you have the correct indexes.

Kimble
+3  A: 

Use an expire date and an index over that.

Allow NULL on that column.

Premium users will have NULL on their games.

Deleting games with expire_date < sysdate will use the index, it is, a INDEX RANGE SCAN (it has to be on orderable index, I mean, some kind of B-tree internal representation... but I'm not a specialist in MySQL).

EDIT

Or mantain a separate table of registry PKs with expire_date. So premium user records will not ocuppy space. Then you do a delete from xxx where pk in (select pk from the expiring_table).

But that's not a very good enhance over the previous solution.

Ages

Maybe you can use ages (by example 1 age = 1 month). And set a field "month_to_live" in the table. Each month you update the field += 1 for all records that are not null. That uses an equality filter. But as I said before, I'm not a specialist so I don't know how much optimization you can gain from that).

I should insist in the expire_date field (and you have the extra capability of extend the live period to anyone, individually, without incurring in extra overhead when you delete records).

helios
Not a bad idea, but you'll have to deal with new users signing up for the premium account and users who stops their premium subscription (if that's possible in this case).
Kimble
Yep, in that case you should set to null the expire or re-assign the date. For reassigning you can apply now+18m or use the original play_date. That only affects one user game each time, so it's fine.
helios
In order to delete related records in other tables I should use cascade delete because the database will ensure maximum speed.
helios
A: 

JOINing shouldn't be too bad, I guess you are not doing that query "live"? Another option would be to do the query when displaying games: If the user is premium, do no limit, otherwise limit the range.

DaDaDom