views:

204

answers:

7

I'm no DBA, but I respect database theory. Isn't adding columns like isDeleted and sequenceOrder bad database practice?

+2  A: 

SequenceOrder doesn't sound great (although you've given no background at all), but I've used columns like IsDeleted for soft deletions all my career.

Galwegian
+5  A: 

IsDeleted columns have two purposes.

  1. To hide a record from users instead of deleting it, thus retaining the record in the database for later use.

  2. To provide a two-stage delete process, where one user marks a record for deletion, and another user confirms.

Not sure what SequenceOrder is about. Do you have a specific application in mind?

Robert Harvey
There is a third practical purpose: When a user deletes a record, half a year later your'll still have a way to prove that they actually did it themselves, should they come to you and inquire where their stuff went.
Tomalak
+3  A: 

Absolutely not. Each database has different requirements, and based on those requirements, you may need columns such as those.

An example for isDeleted could be if you want to allow the user interface to delete unneeded things, but retain them in the database for auditing or reporting purposes. Or if you have incredibly large datasets, deleting is a very slow operation and may not be possible to perform in real-time. In this case, you can mark it deleted, and run a batch clean-up periodically.

An example for sequenceOrder is to enable arbitrary sorting of database rows in the UI, without relying on intrinsic database order, or sequental insertion. If you insert rows in order, you can usually get them out of order..until people start deleting and inserting new rows.

Dave Bauman
+11  A: 

That depends. Being able to soft-delete a tuple (i.e., mark it as deleted rather then actually deleting it) is essential if there's any need to later access that tuple (e.g., to count deleted things, or do some type of historical analysis). It also has the possible benefit, depending on how indexes are structured, to cause less of a disk traffic hit when soft-deleting a row (by having to touch fewer indexes). The downside is that the application takes on responsibility for managing foreign keys to soft-deleting things.

If soft deleting is done for performance, a periodic (e.g., nightly, weekly) tasks can clean soft-deleting tuples out during a low-traffic period.

Using an explicit 'sequence order' for some tuples is useful in several cases, esp. when it's not possible or wise to depend on some other field (e.g., ids, which app developers are trained not to trust) to order things that need to be ordered in some specific way for business reasons.

Dave W. Smith
+1 for 'tuple' and a great answer!
p.campbell
+1 good comments on the performance scenario. Lots of times data can be ordered for arbitrary, non-derivable reasons, and SequenceOrder makes perfect sense.
RedFilter
A: 

Backing up what others have said, both can have their place.

In our CRM system I have an isDeleted - like field in our customer table so that we can hide customers we are no longer servicing while leaving all the information about them in the database. We can easily restore deleted customers and we can strictly enforce referential integrity. Otherwise, what happens when you delete a customer but do not want to delete all records of the work you have done for them? Do you leave references to the customer dangling?

SequenceOrder, again, is useful to allow user-defined ordering. I don't think I use it anywhere, but suppose you had to list say your five favorite foods in order. Five tasks to complete in the order they need to be completed. Etc.

rotard
+1  A: 

Since you explicitly state that you're interested in the theoretical perspective, here goes :

At the level of the LOGICAL design, it is almost by necessity a bad idea to have a boolean attribute in a table (btw the theory's correct term for this is "relvar", not "table"). The reason being that having a boolean attribute makes it very awkward to define/document the meaning (relational theory names this the "Predicate") that the relvar has in your system. If you include the boolean attribute, then the predicate defining such a relvar's meaning would have to include some construct like "... and it is -BOOLEANATTRIBUTENAME here- that this tuple has been deleted.". That is awkward circumlocution.

At the logical design level, you should have two distinct tables, one for the non-deleted rows, and one for the deleted-rows-that-someone-might-still-be-interested-in.

At the PHYSICAL desing level, things may be different. If you have a lot of delete-and-undelete, or even just a lot of delete activity, then physically having two distinct tables is likely to be a bad idea. One table with a boolean attribute that acts as a "distinguishing key" between the two logical tables might indeed be better. If otoh, you have a lot of query activity that only needs the non-deleted ones, and the volume of deleted ones is usually large in comparison to the non-deleted ones, it might be better to keep them apart physically too (and bite the bullet about the probably worse update performance you'll get - if that were noticeable).

But you said you were interested in the theoretical perspective, and theory (well, as far as I know it) has actually very little to say about matters of physical design.

Wrt the sequenceOrder column, that really depends on the particular situation. I guess that most of the time, you wouldn't need them, because ordering of items as required by the business is most likely to be on "meaningful" data. But I could imagine sequenceOrder columns getting used to mimick insertion timestamps and the like.

+1 for the pragmatic statement about the relationship between theory and physical design (practicality).
Chris Lively
A: 

Others have adequately tackled isDeleted.

Regarding sequenceOrder, business rules frequently require lists to be in an order that may not be determined by the actual data.

Consider a table of Priority statuses. You might have rows for High, Low, and Medium. Ordering the the description will give you either High, Low, Medium or Medium, Low, High.

Obviously that order does not give information about the relationship that exists between the three records. Instead you would need a sequenceOrder field so that it makes sense. So that you end up with [1] High, [2] Medium, [3] Low; or the reverse.

Not only does this help with human readability, but system processes can now give appropriate weight to each one.

Chris Lively