views:

55

answers:

2

Hi guys,

I have a model Post which has a expiry_date. I want to know what is the best way to manage scalability in this case. 2 options:

  1. Whenever I want to SELECT from the table, I need to include where expiry_date > NOW. If the table Post grows like a monster, I will be in trouble. Imagine after 3 years or more. Indexes will be huge too.

  2. Have a trigger, cron job, or a plugin (if it exists) that would go around the table and move expired items to a new table Post_Archive. That way, I maintain only current Posts in my main table, which implies that over 3 years I won't be as bad as option 1.

Any help? comments from your experience?

Thanks, Youssef

A: 

If you need to archive data on a continuous basis (your #2) than a good option is MaatKit.

http://www.maatkit.org/

It can "nibble" away at data in chunks rather than running mass queries which consume lots of resources (and avoiding polluting your key cache).

So yes, you would run a Maatkit job from cron.

In the meantime, if you want to do #1 at the same time, you could maybe implement a view which conveniently wraps up the "WHERE expiry_dat > NOW" condition so you dont have to include it all on your code.

Cody Caughlan
Maatkit is an excellent pointer, thanks -- mk-archiver is a pretty complicated utility, but a companion article at http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/ explains the "principles of opeations" and the ideas behind it in an extremely instructive way -- so again thanks for leading me to finding it!-)
Alex Martelli
+1  A: 

A cron job sounds good to me, and it can be done by feeding a simple script directly to the mysql command, e.g., roughly:

CREATE TEMPORARY TABLE Moving
SELECT * FROM Post WHERE expiry > NOW();

INSERT INTO Post_Archive
SELECT * FROM Moving;

DELETE FROM Post
WHERE id IN (SELECT id FROM Moving);

DROP TEMPORARY TABLE Moving;
Alex Martelli