How to efficiently unpublish all datas from a particular user on a blogging application ?

We develop and operate a blogging application in which user data a scattered across many tables:
- Blog
- Article
- Comment
- Message
- Trackback
- 50 other tables.

Users are able to close their account, and their account/contents must disappear from the site right away.

For legal/contractual reasons, we also must be able to undelete their account/content for a given duration, and also to make those data available for juridic authorities for another duration.

Over the years and different applications, we used different approaches:

"deleted" flag everywhere : Each table has a "deleted" column, which is updated when data is deleted/restored. Very nasty because it slows down every list generation queries, creates a lot of updates upon deletion/restore. Also, it does not handle the two stage deletion described above. In fact we never used this one, but it's worth dis-advising it :)
"Multi table": For each table, we create a second table with the same schema plus two extra fields (dateDeleted, reason). The extra fields are used to know if the data is still accessible for restoration, when to delete it, and why/how it was deleted in the first place. This version is just a bit better than the previous version, but can be very nasty performance wise too when tables are growing. Also, you have to change the schema of some tables (ie: remove UNIQUE constraints) which makes the system harder to understand/upgrade for new developers, administrators ... and mentally healthy people in general.
"Multi DB": Same approach as before, but we move data on a different database cluster, which allows to browse those data without impacting the "end users" db. Also, for this app, the uniqueness constraint is done at the java level, so all the schemas are the same. Lastly, the double data retention constraint is done by having a dedicated DB for each constraint, which makes things easiers.

I have to admit that none of those approaches satisfies me, even if they can work up to a certain amount of data. I have also imagined that we could just delete some key rows in the DB, and let the rest inconsistent (and scheduled for a more controlled deletion job), but it scares me ...

Do you know other ways of doing the same thing, keeping the same level of features (we could align the two durations to simplify the problem) ? I'm not looking a solution for my existing apps, but would like to improve the next ones.

Any input will be highly appreciated !

ansaurus

tags:

views:

answers:

How to efficiently unpublish all datas from a particular user on a blogging application ?

related questions