views:

44

answers:

2

I am designing my database so i can do incremental backups. I am thinking there should be a weekly cron job that dumps data in a read only tar.bz2 and can dump most of my data easily since its had a modified date or is insert only. However there are some things like user_profile which holds the password and signature. How should i know when this needs to be dumped/updated? should i use a bit called is_dirty or should i have a modified date? What are the pros and cons for each?

currently i am using sqlite, i am switching to tsql and if there are problems i may switch to mysql depending on my host provider

+1  A: 

last_modified would work well, I suggest your logical export keep its own data about what was last backed up and you do not go the way of an is_dirty flag. Let your design for your data be about your data and let your logical export be the one to track what was last backed up. And, before all that, have you looked thoroughly for a backup system that has already been written? ( why re-invent the wheel? )

Also, further consideration, any flag column usually doesn't index so well so querying on the column can be a real drag. Unless you use a bitmap index, which has its own caveats and things to be wary of.

I always design all my tables to have an entry_date field (with triggers to support) and an mod_date field (again with triggers).

harschware
What do you mean by triggers? logic triggers in code or triggers in db? (I heard of sqlite triggers but i dont know what they are). I am not writing up a tool to do backups. This is just so i am not forced to dump the entire table everytime i want to backup data.
acidzombie24
Yes, triggers are a stored procedure that runs inside your database of choice. Most, but not all, databases support them - in some form or another. In this case, you would write a trigger to auto fill your entry_date and mod_date with the current time on insert or update (respectively). You shouldn't rely on your application code to do this, for a variety of reasons, but the foremost would be time synchronization accross clients.
harschware
+2  A: 

I would suggest using last_modified as this timestamp would be not only be effective for your backup process, but such a field is generally useful data to track. Compared to an is_dirty flag, knowing when a field was updated will help with tasks like debugging production problems (knowing when a user profile was updated and relating that to the time of the user's complaint) and the validation of enhancements to your db updates.

akf