ansaurus

Question

What is the best way to implement soft deletion?

Answer 1

+24 A:

You could perform all of your queries against a view that contains the WHERE IS_DELETED='0' clause.

David Sokol 2008-09-16 00:49:19

This is what I did with one of my projects. It works well.

tj9991 2008-09-16 00:51:12

Second that. This sounds perfectly suited for a view.

Yadyn 2008-09-16 00:57:08

don't forget to index that column ;)

Jiaaro 2010-01-04 14:53:48

This column is not sargable, so index is unnecessary.

dario-g 2010-05-14 22:53:57

Answer 2

+7 A:

if the table is large and performance is an issue, you can always move 'deleted' records to another table, which has additional info like time of deletion, who deleted the record, etc

that way you don't have to add another column to your primary table

Jiaaro 2008-09-16 00:49:49

This very nearly got the accepted answer. Some of the tables will be large(ish) and I didn't think of the performance issue. If there was no audit database I would definately take this approach.

Josh Smeaton 2008-09-16 09:31:31

sweet... it's really not about getting points to me though... if it's useful that's what really matters.glad you found the answer helpful :)

Jiaaro 2008-09-16 20:26:56

I actually really like that idea because then my main table isn't cluttered with thousands of soft deleted records. How would you handle child and dependent records, though? Would you basically recreate the same table dependencies (such as for an order and order items)?

Arthur Chaparyan 2009-02-03 09:06:57

Answer 3

+17 A:

I would lean towards the "Rails way" with a deleted_at column that contains the datetime of when the deletion took place. Then you get a little bit of free metadata about the deletion. For your SELECT just get rows WHERE deleted_at IS NULL

ctcherry 2008-09-16 00:50:42

this is how we do it -- deletion is always a date field

Jeff Atwood 2008-09-16 01:11:11

+1 because it holds exactly the same information as the solution presented in the "best answer" and additonally gives you information at what date the deletion took place.

Willi 2010-01-10 11:06:02

Answer 4

+3 A:

That depends on what information you need and what workflows you want to support.

Do you want to be able to:

know what information was there (before it was deleted)?
know when it was deleted?
know who deleted it?
know in what capacity they were acting when they deleted it?
be able to un-delete the record?
be able to tell when it was un-deleted?
etc.

If the record was deleted and un-deleted four times, is it sufficient for you to know that it is currently in an un-deleted state, or do you want to be able to tell what happened in the interim (including any edits between successive deletions!)?

Daniel Fortunov 2008-09-16 00:57:24

Answer 5

+3 A:

You will definitely have better performance if you move your deleted data to another table like Jim said, as well as having record of when it was deleted, why, and by whom.

Adding where deleted=0 to all your queries will slow them down significantly, and hinder the usage of any of indexes you may have on the table. Avoid having "flags" in your tables whenever possible.

UltimateBrent 2008-09-16 00:57:50

Answer 6

+10 A:

Having is_deleted column is a reasonably good approach. If it is in Oracle, to further increase performance I'd recommend partitioning the table by creating a list partition on is_deleted column. Then deleted and non-deleted rows will physically be in different partitions, though for you it'll be transparent.

As a result, if you type a query like

SELECT * from table_name where is_deleted = 1

then Oracle will perform the 'partition pruning' and only look into the appropriate partition.

Sergey Stadnik 2008-09-16 01:01:14

I wish I could upvote this more than once, nice, easy and it just solved a problem I'm having

massimogentilini 2009-11-06 00:15:07

Answer 7

A:

I prefer using the 'pickle' python extension, and storing the result in the user's session. It's clean, because it's completely out of the database, and you don't have to deal with cron jobs at all.

It may not be the best solution for this senario, but It's good when storing information for an 'undo' link.

Psychcf 2008-09-16 01:02:43

Answer 8

A:

I prefer to keep a status column, so I can use it for several different configs, i.e. published, private, deleted, needsAproval...

Unkwntech 2008-09-16 01:09:30

Answer 9

A:

use a view, function, or procedure that checks is_deleted=0, i.e. don't select directly on the table in case the table needs to change later for other reasons

and index the is_deleted column for larger tables

since you already have an audit trail, tracking the deletion date is redundant

Steven A. Lowe 2008-09-16 03:25:48

Answer 10

+1 A:

Something that I use on projects is a statusInd tinyint not null default 0 column using statusInd as a bitmask allows me to perform data management (delete, archive, replicate, restore, etc.). Using this in views I can then do the data distribution, publishing, etc for the consuming applications. If performance is a concern regarding views, use small fact tables to support this information, dropping the fact, drops the relation and allows for scalled deletes.

Scales well and is data centric keeping the data footprint pretty small - key for 350gb+ dbs with realtime concerns. Using alternatives, tables, triggers has some overhead that depending on the need may or may not work for you.

SOX related Audits may require more than a field to help in your case, but this may help. Enjoy

2008-09-16 05:31:44

Answer 11

+4 A:

The best response, sadly, depends on what you're trying to accomplish with your soft deletions and the database you are implementing this within.

In SQL Server, the best solution would be to use a deleted_on/deleted_at column with a type of SMALLDATETIME or DATETIME (depending on the necessary granularity) and to make that column nullable. In SQL Server, the row header data contains a NULL bitmask for each of the columns in the table so it's marginally faster to perform an IS NULL or IS NOT NULL than it is to check the value stored in a column.

If you have a large volume of data, you will want to look into partitioning your data, either through the database itself or through two separate tables (e.g. Products and ProductHistory) or through an indexed view.

I typically avoid flag fields like is_deleted, is_archive, etc because they only carry one piece of meaning. A nullable deleted_at, archived_at field provides an additional level of meaning to yourself and to whoever inherits your application. And I avoid bitmask fields like the plague since they require an understanding of how the bitmask was built in order to grasp any meaning.

Jeremiah Peschka 2008-09-16 12:47:20

I agree, a "deleted_at" is much more preferable to a simple "is_deleted" as you get extra information for free.

BobbyShaftoe 2009-01-02 22:20:01

Answer 12

+1 A:

you don't mention what product, but SQL Server 2008 and postgresql (and others i'm sure) allow you to create filtered indexes, so you could create a covering index where is_deleted=0, mitigating some of the negatives of this particular approach.

Andy Irving 2008-09-16 14:15:30

ansaurus

tags:

views:

answers:

What is the best way to implement soft deletion?

related questions