views:

140

answers:

9

Hi guys,

Part of my table design is to include a IsDeleted BIT column that is set to 1 whenever a user deletes a record. Therefore all SELECTS are inevitable accompanied by a WHERE IsDeleted = 0 condition.

I read in a previous question (I cannot for the love of God re-find that post and reference it) that this might not be the best design and an 'Audit Trail' table might be better.

How are you guys dealing with this problem?

Update I'm on SQL Server. Solutions for other DB's are welcome albeit not as useful for me but maybe for other people.

Update2 Just to encapsulate what everyone said so far. There seems to be basically 3 ways to deal with this.

  1. Leave it as it is
  2. Create an audit table to keep track of all the changes
  3. Use of views with WHERE IsDeleted = 0
+1  A: 

We're actively using the "Deleted" column in our enterprise software. It is however a source of constant errors when forgetting to add "WHERE Deleted = 0" to an SQL query.

Not sure what is meant by "Audit Trail". You may wish to have a table to track all deleted records. Or there may be an option of moving the deleted content to paired tables (like Customer_Deleted) to remove the passive content from tables to minimize their size and optimize performance.

Developer Art
+1 It's the most common bug we have
Faruz
Yeah the Audit trail was taken from the previous question. I'm kinda hoping he's going to notice this post and shed some light on it.And it probably means what you just explained it to me anyway.
Nai
A: 

If you have Oracle DB, then you can use audit trail for auditing. Check the AUDIT VAULT tool form OTN, here. It even supports SQL Server.

Guru
+1  A: 

A while ago there was some blog uproar on this issue, Ayende and Udi Dahan both posted on this.

BennyM
thanks for the links. will check them out.
Nai
+2  A: 

If you have to have this kind of Deleted Bit column, then you really should consider setting up some VIEWs with the WHERE clause in it, and use those rather than the underlying tables. Much less error prone.

For example, if you have this view:

CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No

Then someone who wants to see current products can simply write:

SELECT * FROM [Current Product List]

This is much less error prone than writing:

SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No

As you say, people will forget that WHERE clause, and get confusing and incorrect results.

P.S. the example SQL comes from Microsoft's Northwind database. Normally I would recommend NOT using spaces in column and table names.

Michael Dillon
Hmm I know what you mean but it's not possible given that its an OLTP environment and that results have to be up to the minute.
Nai
I think you are confusing views with something else. Views are just stored SELECT statements, they run against the current set of data.
APC
What I meant is that the returned selection has to be up to date at all times. Setting up triggers to refresh the views are too expensive to be doing all the time.
Nai
Plain old views are always up-to-date and don't need to be refreshed. They are just predefined ways of viewing data.
Rudd Zwolinski
Triggers to refresh views? A view does not need a trigger. A view is a SELECT
Mark Schultheiss
Please please please, don't ever put spaces in user-defined object names!
Philip Kelley
Good point Philip. I edited my answer to note this point.
Michael Dillon
A: 

Nai this is totally up to you.

Do you need to be able to see who has deleted / modified / inserted what and when? If so, you should design the tables for this and adjust your procs to write these values when they are called.

If you dont need an audit trail, dont waste time with one. Just do as you are with IsDeleted.

Personally, I flag things right now, as an audit trail wasn't specified in my spec, but that said, I don't like to actually delete things. Hence, I chose to flag it. I'm not going to waste a clients time writing something they diddn't request. I wont mess about with other tables because that's another thing for me to think about. I'd just make sure my index's were up to the job.

Ask your manager or client. Plan out how long the audit trail would take so they can cost it and let them make the decision for you ;)

Pace
Well, I happen to work for myself so the decision is ultimately up to me. However, given that the IsDeleted columns have already been set up, I imagine there will be some massive refactoring to accommodate an audit trail, hence my question on the pros and cons of my current design.
Nai
A: 

Udi Dahan said this:

Model the task, not the data Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).

As we broaden our perspective to more parts of the system, we see this same pattern repeating:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Michael Dillon
+4  A: 

Therefore all SELECTS are inevitable accompanied by a WHERE IsDeleted = 0 condition.

This is not a really good way to do it, as you probably noticed, it is quite error-prone. You could create a VIEW which is simply

CREATE VIEW myview AS SELECT * FROM yourtable WHERE NOT deleted;

Then you just use myview instead of mytable and you don't have to think about this damn column in SELECTs.

Or, you could move deleted records to a separate "archive" table, which, depending on the proportion of deleted versus active records, might make your "active" table a lot smaller, better cached in RAM, ie faster.

peufeu
Ok ya this works. However, the data frequently inserted into the table hence the need to refresh the view quite often. The overhead associated with it outweighs the its benefits of guarding against carelessness (of not including IsDeleted condition).
Nai
@Nai: What do you mean by "refresh the view quite often"? Typically, a view is just a stored query (that is, they are always "up-to-date" because they simply query the original table). Perhaps you are thinking about *materialized* views? I believe they require "refreshing".
Adam Paynter
@Nai - A view is akin to a SELECT, and has NO "refresh" need - when you update the base table, the view is automatically set - the same as if you ran a SELECT with the IsDeleted = 0 in it.from MSDN:"SQL Server maintains views automatically;"
Mark Schultheiss
@adam @mark: oh man looks like I need to re-visit my understanding of views. I have always thought of them as a static snapshot of the results obtained in creating the view.
Nai
It seems a bit odd now thinking that I can do inserts/delete/update on views as well as opposed to the underlying tables. So what is the point of sp_refreshviews. MSDN says it refreshes the meta data, that is to say e.g., a new column being added to one of the underlying tables and sp_refreshviews is then needed to update the view? Is that correct?
Nai
@Nai: Suppose you create a view like `SELECT * FROM mytable ...`. If `mytable` had 5 columns when you created the view, the view will continue to return *just* those 5 columns, even if `mytable` gains a 6th column. `sp_refreshviews` forces the view to return all 6 columns in this scenario.
Adam Paynter
@Adam: Great thanks for that.
Nai
A: 

Views (or stored procs) to get at the underlying table data are the best way. However, if you have the problem with "too many cooks in the kitchen" like we do (too many people have rights to the data and may just use the table without knowing enough to use the view/proc) you should try using another table.

We have a complete mimic of the base table with a few extra columns for tracking. So Employee table has an EmployeeDeleted table with the same schema but extra columns for when it was deleted and who deleted it and sometimes even the reason for deletion. You can even get fancy and have triggers do the insertion directly instead of going through applications/procs.

Biggest Advantage: no flag to worry about during selects

Biggest Disadvantage: any schema changes to the base table also have to be made on the "deleted" table

Best for: situations where for whatever reason (usually political with us) many not-as-experienced people have rights to the data but still expect it to be accurate without having to understand flags or schemas, etc

ktharsis
A: 

I've used soft deletes before on a number of applications I've worked on, and overall it's worked out quite well. Yes, there is the issue of always having to remember to add AND IsActive = 1 to all of your SELECT queries, but really that's not so bad. You can create views if you don't want to have to remember to always do that.

The reason we've done this is because we had very specific business needs to be able to report on records that have been deleted. The reporting needs varied widely - sometimes they'd need to see just the active records, or just the inactive records, or sometimes a mix of both - so pushing all the deleted records into an audit table wasn't a very good option.

So, depending on your particular business needs, I think this approach is certainly a viable option.

bcwood