views:

45

answers:

3

I have a question related to database design. The database that I'm working with requires data to treated in some way that it is never physically deleted. We started going down a path of adding a "DeleteDateTime" column to some tables, that is NULL by default but once stamped would mark a record as deleted.

This gives us the ability archive our data easily but I still feel in the dark on a few areas, specifically whether this would be considered in line with best practices and also how to go about indexing these tables efficiently.

I'll give you an example: We have a table called "Courses" with a composite primary key made up of the columns "SiteID" and "CourseID". This table also has a column called "DeleteDateTime" that is used in accordance with my description above.

I can't use the SQL Server 2008 filtered view feature because we have to be SQL Server 2005 compatible. Should I include "DeleteDateTime" in the clustered index for this table? If so should it be the first column in the index (i.e. "DeleteDateTime, SiteID, CourseID")...

Does anyone have any reasons why I should or shouldn't follow this approach?

Thanks!

+2  A: 

Is there a chance you could transfer those "dead" records into a separate table? E.g. for your Courses table, have a Courses_deleted table or something like that, with an identical structure.

When you "delete" a record, you basically just move it to the "dead table". That way, the index on your actual, current data stays small and zippy....

If you need to have an aggregate view, you can always define a Courses_View which unions the two tables together.

Your clustered index on your real table should be as small, static and constant and possible, so I would definitely NOT recommend putting such a date time column into it. Not a good idea.

For excellent info on how to choose a good clustering key, and what it takes, check out Kimberly Tripp's blog entries:

Marc

marc_s
Marc, I've given a little bit of thought to that... either moving old records to a "History" table or even database. My main concern, albeit a vague one, is that it will be more data to maintain. It also brings up questions such as should I enforce the same FKs on my history table as my "live" table ..etc. Thanks for the links to Kim Tripp's blogs. She's a fantastic resource.
Justin Swartsel
@Justin: yes, of course - if you need to partition your table into two parts - active and deleted - it is more work, no doubt. Keeping everything in the same place is easier. Maybe you just need to create a view on top of your table which only shows the active records (SELECT (fields) .... WHERE DeleteDateTime IS NULL) ? Maybe that would be good enough to solve your problem already....
marc_s
+2  A: 

what's your requirements on data retention? have you looked into an audit log instead of keeping all non-current data in the database?

I think you have it right on the head for the composite indexes including your "DeleteDateTime" column.

Chad
I'll have to look into the Audit log stuff more, Chad. Thanks.
Justin Swartsel
@Justin: I've found CodeSmith nice for generating auditing triggers for sql server. good luck
Chad
A: 

I would create a view that is basically

select {List all columns except the delete flag} 
from mytable 
where deletflag is null

This is what I would use for all my queries on the table. The reason why is to prevent people from forgetting to consider the deleted flag. SQL Server 2005 can easily handle this kind of view and it is necessary if you are goin to use thisdesign for delting records. I would have a separate index on the delted column. I likely would not make it part of the clustered index.

HLGEM
Using a View to abstract the raw data is definitely something to consider; but it doesn't really address the indexing part of my question.I suppose that I might be able to index the view; but using "with (noexpand)" on every join is getting into that area where it's one more thing for developers to remember. Same issue as "WHERE DeleteDateTime IS NULL"...
Justin Swartsel