tags:

views:

26

answers:

2

I was wondering what would be the best solution to dynamically archive rows. For instance when a user marks a task as completed, that task needs to be archived yet still accessible.

What would be the best practices for achieving this? Should I just leave it all in the same table and leave out completed tasks from the queries? I'm afraid that over time the table will become huge (1,000,000 rows in a year or less). Or should I create another table ie task_archive and query that row whenever data is needed from it?

I know similar questions have been asked before but most of them where about archiving thousands of rows simultaneously, I just need to know what would be the best method (and why) to archive 1 row at a time once it's been marked completed

A: 

You could use a trigger to capture that the order was marked completed, remove from the current table, and insert into the archive table.

Or, you could create a stored procedure that performed the archive. For example

sp_markcompleted(taskid)

start transaction;

insert into newtable select * from oldtable where id=taskid;

delete from oldtable where id=taskid;

commit;

Gary
+1  A: 

For speed and ease of use, I would generally leave the row in the same table (and flag it as completed) and then later move it to an archive table. This way the user doesn't incur the delay of making that move on the spot; the move can happen as a batch process during non-busy periods.

When that move should happen depends on your application. For example, if they have a dashboard widget that shows "Recently Completed Tasks" that shows all of the tasks completed in the past week (and lets them drill in to see details), it might make sense to move the rows to the archive a week after they've been completed. Or if they frequently need to look at tasks from the current semester (for an academic app) but rarely for previous semesters, make the batch move happen at the end of the semester.

If the table is indexed 1,000,000 rows shouldn't be that big a deal, honestly.

JacobM
Wouldn't it complicate things in the sense that if a user looks at all of the completed tasks from within a date range (sorted by creation date) I would have to query the tables using `UNION ALL` which would also affect performance?
Serge
Yes. The goal would be to move things to the archive at some point after the user is likely to want to make such a query; maybe that's 5 years for your application. The idea is that the types of queries that are made often are served by the regular task table, while queries against the archive should be rare. It might even be that the "archive" is in a separate system altogether such as a data warehouse.
JacobM
I see what you mean, makes perfect sense. I'll still need to query the archived table for statistical purposes but if I think long term then those queries should be rare like you said. Thanks!
Serge