views:

894

answers:

16

Are soft deletes a good idea or a bad idea?

Instead of actually deleting a record in your database, you would just flag it as "IsDeleted" = true, and upon recovery of the record you could just flag it as "False".

Is this a good idea?

EDIT: this might be a better idea??? what about physically deleting the record then moving it to a external achieve database, if the user wants the record back, then software will look for the record in the external database and recreate it?

+28  A: 

It's never a bad idea to avoid potential data loss.

I always soft-delete. In cases where the database needs to be scrubbed of one or more records, I generally employ either a two-step process of soft deletion and then emptying a "recycle bin" of records, or a document-management-style approach where document records can be aged away, and then go through an approval process prior to hard deletion.

Robert Harvey
The system often still needs the data for integrity, auditing or history of change... soft delete away! Use cleanup processes for real deletions etc... PK :-)
Paul Kohler
+1: `I always soft-delete.`
KMan
+9  A: 

It's a good idea when and if an invalid delete is absolutely catastrophic and recovery should be simple. It's also a good idea if you want to keep track of everything that has ever been and "delete" really only means "hide." Meaning, it's up to the situation.

Anthony Pegram
+4  A: 

Soft deletes would also allow you to revoke DELETE privileges from the database account used by the application.

Daniel Vassallo
*sigh* A database account used by an application? That would be lovely.
aehiilrs
Not that it would be useful for anything except human mistakes.. If I'm a malicious user and gain access to it I can still "DELETE" them by doing `UPDATE table SET Field1 = 0, Field2 = 0, ... WHERE 1`
Andreas Bonini
+9  A: 

It depends on the circumstances. I could see situations where you are legally required to truly delete something. Maybe someone has requested that their social security number be permanently removed from your system. Or maybe you have a duplicate record that you want to consolidate into a single record. Keeping the duplicate hanging around with a deleted flag might not be advantageous.

There is also one technical disadvantage: You can't do cascading deletions, which automatically clear out any references to the deleted data to prevent foreign key violations. This isn't necessarily a big issue, but it's something to keep in mind.

Otherwise, I think it's a good idea.

DanM
Good point re: cascading deletes.
MusiGenesis
You *can* do cascading deletes whether it's hard or soft.
UpTheCreek
+1  A: 

It comes with a cost, though, because you need to update your queries and indexes to be able to exclude the deleted rows.

Maybe instead of toggling a flag, move it to another "trash can" table.

Also, one could say that is only a partial solution, because it covers only deletes, but when you update a row, you are still overwriting the old value.

In general, I'd say never delete anything unless you really have to. Disk space is cheap these days. Of course, there are limits, there is data that you are legally bound to erase, there is data that is really not all that important, and maybe you do not need to keep the old data online and in the same table (an archive somewhere would also work).

Thilo
+7  A: 

One of the major problem for soft delete is those unwanted data will potentially affects the db performance. Several years ago one of my Client requested me to do soft delete on all database items, my solution to that is to move all "deleted" items to a backup table, instead of leaving it to the current running tables.

xandy
+1 for moving the data to another table!
scunliffe
You might instead partition the table by "IsDeleted" to mitigate performance implications if its possible to do so in your db. Then you don't have to worry about two different tables.
Michael Petito
+2  A: 

It depends on the data. Some data cannot be deleted due to legal/audit requirements.

Social networking sites on the other hand should provide an option to delete an account with all associated data, including contact info, photos, messages, etc. It's a real annoyance if they don't, e.g. Facebook.

armandino
And some data MUST be deleted due to legal requirements.
Thilo
Good point. Haven't thought of that one.
armandino
And some data MUST be deleted due to illegal requirements. :)
MusiGenesis
+1  A: 

Some times soft deletes are necessary. For example, say you have an Invoice table that references a Products table. Once you have created an Invoice with a specific Product you could then never delete that Product (and if your RI is set up correctly it won't let you).

This specific scenario assumes that you'll never want to delete the Invoices, which in a real company you probably wouldn't want to delete historical financial data.

Though there are many other cases where you would not be able to delete some data as a side effect of a dependency up the chain not being deletable for reasons business or other.

joshperry
A: 

It's a good idea if you want to show that a record is no longer current, but you can't delete it (due to referential integrity) or you have to maintain it for audit purposes. In one project, we had start date/end date columns on a lot of our tables so that we could soft-delete or "expire" records by setting the end date, but they would never be deleted.

Charles
+6  A: 

I say it's a bad idea, generally (with some exceptions, perhaps).

First, your database should be backed up regularly, so you should never be in a situation where you would lose data permanently because of a DELETE (unless it's a deletion of just-added data, of course).

Second, a soft delete like this means you now have to include a WHERE IsDeleted = false clause in every query on this table (and so much worse if you're JOINing these tables). A mistake here would be caught as soon as a user or tester noticed a deleted record showing up again, which might take some time. Also, it would be easy for a developer to omit the WHERE clause from COUNT(*) queries, which might take even longer to discover (I worked on one project where this had been happening for years; not many records were ever "deleted", so the totals were close to what was expected and no one noticed).

Finally, a soft delete will work on a table with artificial keys, but potentially won't work on a table with a natural primary key (e.g. you "delete" someone from a table keyed by Social Security Number - what do you do when you need to add him back? Please don't say "include IsDeleted in a compound primary key".).

In a design review, I would expect the developer to demonstrate an awareness of the costs and benefits and to present an excellent reason for doing soft deletes in this manner. "Why not do it?" is not an excellent reason.

MusiGenesis
It's fairly trivial to create a view and materialize it if need be, that excludes all is_deleted records. All queries can then be run against the view.As far as recovering from a backup - say for something like re-enabling a user account - you really don't want to go running to the archives to restore their data. You'd like to set is_deleted = false. This is a fairly primitive (and unlikely) scenario as you'd probably use a disabled field - but the point is the same. For some cases, to be able to recover the deleted data instantly could be worthwhile.
Josh Smeaton
@Josh: *everything* in SQL programming is fairly trivial; it's the accumulation of lots of trivial things that starts to cause problems. As I said in my last sentence, I would expect a developer to have a good reason for increasing the complexity of the system in this way. I'm disappointed but not at all surprised to see so many developers say they do this as a matter of course, regardless of the actual needs of their application.
MusiGenesis
@MG, "unless it's a deletion of just-added data, of course", Your RDBMS doesn't have a transaction log? you can't do point in time recovery?
Stephanie Page
Let me support MG. You have a timestamp field. You frequently range scan it via an index... everything between sysdate -2 and sysdate. If you soft delete frequently that range scan could return a high percentage of rows that will have to be filtered out. Filtering = expensive, range scan = cheap.
Stephanie Page
@Josh, if you're going to materialize the view, why not just make it a table from the state. Mviews are a good way to fix something that's broken - I'd hesitate to design that way from the start.
Stephanie Page
@steph I was explicitly referring to coding against the exclusion of is-deleted rows. It'd be much nicer to not have to remember to exclude the column in every where clause. In general I agree with you though.
Josh Smeaton
+3  A: 

Though this is not answering your question, you can find an interest post from Frans Bouma on his blog about this topic. I think it's worth reading it.

uvita
Good article - I think it answers the question just fine.
MusiGenesis
I'm not so sure. He claims that using unique constraints becomes impossible. That is not true for Oracle (and maybe other db's?). You can solve this problem with a unique constraint on a function based index. But just look at the post itself. You can still see it but you can no longer comment. In a way the article has been softly deleted. I would like to take a look in the db that stored that blog. Is it in the same table as the blogs you still can comment on or is it moved to another table?
TTT
A: 

I am a fan of soft-deletes. Primarily to prevent cascading deletes. However, it takes additional code so that if you are SELECTing a child object, it joins to the parent (and all parent!) objects to make sure none of them are deleted. Alternatively you can cascade the soft-delete, but if you want to restore them later you may not know which children had already been deleted and which were deleted due to the cascade.

Additionally, I keep a revision date time and revision username on each object, so that I know who modified (or soft-deleted) it last. Then for an audit trail, I create a *History (like CustomerHistory) table that is inserted after every UPDATE to the original table. This way after an object is modified or soft-deleted, I have a record of who performed the action as well as the last known state of the object.

Brad
Wouldn't it be easier to prevent cascading deletes by un-checking the "CASCADE DELETES" box (or however it's done in your database of choice)?
MusiGenesis
A: 

I encountered soft-deletes for the following broad scenarios:

CASE 1: remove the record from being user/code visible, but have the record at the DB level since the business is interested in knowing it had that records.
These requirements are mostly driven by the business & usually at the core is perhaps a legal requirement (like @joshperry & @armandino scenarios) to have the previous record in the database & create a new record for every change made. At this point, I would look at CASE 2 & evaluate if it satifys the requirements before having an IsDeleted flag

CASE 2: audit trails to keep track of the evolution of a record - there are tons of decent articles online for keeping audit trails of records in a database

HTH.

Sunny
+1  A: 

Just to add a cent. I always soft-delete; though it does cost the performance, but very slightly. Think about the cost, when your customer complains regarding your software that stopped functioning after she performed certain actions that even she can't remember. Well, this may be a fat example, but you would never know what went wrong, who did what, what was before and what was inserted afterwards. In that case this would come handy. This functionality comes handy for auditing purpose, and many a customer requests for auditing reports of this sort.

Also, in most workflow based applications, it comes as a software feature/requirement that customer is interested in the "actions" performed on a work item; what values were assigned and who processed it, etc.

KMan
+2  A: 

If you're going to use soft deletion, it's a good idea to have a deleted_date field, instead of an is_deleted field. You get a nice piece of extra data instead of just the bit field.

Josh Smeaton
+1 for using DateTime fields instead of Bits... the number of old systems I have to manage with an "IsSomething" and "DateSomethinged" is annoyingly high...
Keith Williams
A: 

in oracle, if you add the primary key to a recycle_bin table you make up, then add a row level security policy, you can suppress the values from all queries when the row is in the recycle bin, removing the pk from the recycle bin will automatically restore all data. no need to change your other queries to accomodate the logic.

Randy
Thats a very interesting feature, thanks for that.
K001