tags:

views:

165

answers:

15

Everyone has accidentally forgotten the WHERE clause on a DELETE query and blasted some un-backed up data once or twice. I was pondering that problem, and I was wondering if the solution I came up with is practical.

What if, in place of actual DELETE queries, the application and maintenance scripts did something like:

UPDATE foo SET to_be_deleted=1 WHERE blah = 50;

And then a cron job was set to go through and actually delete everything with the flag? The downside would be that pretty much every other query would need to have "WHERE to_be_deleted != 1" appended to it, but the upside would be that you'd never mistakenly lose data again. You could see "2,349,325 rows affected" and say, "Hmm, looks like I forgot the WHERE clause," and reset the flags. You could even make the to_be_deleted field a DATE column, so the cron job would check to see if a row's time had come yet.

Also, you could remove DELETE permission from the production database user, so even if someone managed to inject some SQL into your site, they wouldn't be able to remove anything.

So, my question is: Is this a good idea, or are there pitfalls I'm not seeing?

+1  A: 

You could set up a view on that table that selects WHERE to_be_deleted != 1, and all of your normal selects are done on that view - that avoids having to put the WHERE on all of your queries.

Ken Ray
A: 

The pitfall is that it's unnecessarily complicated and someone will inadvertently forget too check the flag in their query. There's also the issue of potentially needing to delete something immediately instead of wait for the scheduled job to run.

Ben Hoffstein
A: 

To avoid the to_be_deleted WHERE clause you could create a trigger before the delete command fires off to insert the deleted rows into a separate table. This table could be cleared out when you're sure everything in it really needs to be deleted, or you could keep it around for archive purposes.

Aaron Smith
+3  A: 

That is fine if you want to do that, but it seems like a lot of work. How many people are manually changing the database? It should be very few, especially if your users have an app to work with.

When I work on the production db I put EVERYTHING I do in a transaction so if I mess up I can rollback. Just having a standard practice like that for me has helped me.

I don't see anything really wrong with that though other than ever single point of data manipulation in each applicaiton will have to be aware of this functionality and not just the data it wants.

Arthur Thomas
What happens when people use (nolock) for speed and then you roll back your transaction....
Cervo
if you don't use locks were appropriate then you are basically saying you do not care about consistency. If you care about consistency then the price is a transaction with potential locks.
Arthur Thomas
A: 

You also get a "soft delete" feature so you can give the(certain) end-users the power of "undo" - there would have to be a pretty strong downside in the mix to cancel the benefits of soft deleting.

micahwittman
+2  A: 

This would be fine as long as your appliction does not require that the data is immediately deleted since you have to wait for the next interval of the cron job.

I think a better solution and the more common practice is to use a development server and a production server. If your development database gets blown out, simply reload it. No harm done. If you're testing code on your production database, you deserve anything bad that happens.

+1  A: 

A lot of people have a delete flag or a row status flag. But if someone is doing a change through the back end (and they will be doing it since often people need batch changes done that can't be accomplished through the front end) and they make a mistake they will still often go for delete. Ultimately this is no substitute for testing the script before applying it to a production environment.

Also...what happens if the following query gets executed "UPDATE foo SET to_be_deleted=1" because they left off the where clause. Unless you have auditing columns with a time stamp how do you know which columns were deleted and which ones were done in error? But even if you have auditing columns with a time stamp, if the auditing is done via a stored procedure or programmer convention then these back end queries may not supply information letting you know that they were just applied.

Cervo
+2  A: 

Too complicated. The standard approach to this is to do all your work inside a transaction, so if you screw up and forget a WHERE clause, then you simply roll back when you see the "2,349,325 rows affected" result.

Jim
A: 

The "WHERE to_be_deleted <> 1" on every other query is a huge one. Another is once you've ran your accidentally rogue query, how will you determine which of the 2,349,325 were previously marked as deleted?

I think the practical solution is regular backups, and failing that, perhaps a delete trigger that captures the tuples to be axed.

Pseudo Masochist
+2  A: 

It may be easier to create a parallel table for deleted rows. A DELETE trigger (and UPDATE too if you want to undo changes as well) on the original table could copy the affected rows to the parallel table. Adding a datetime column to the parallel table to record the date & time of the change would let you permanently remove rows past a certain age using your cron job.

That way, you'd use normal DELETE statements on the original table, so there's no chance you'll forget to run your special "DELETE" statement. You also sidestep the to_be_deleted != 1 expression, which is just a bug waiting to happen when someone inevitably forgets.

yukondude
A: 

The other option would be to create a delete trigger on each table. When anything is deleted, it would insert that "to be deleted" record into another table, ideally named TABLENAME_deleted.

The downside would be that the db would have twice as many tables.

I don't recommend triggers in general, but it might be what you are looking for.

jinsungy
A: 

This is why, whenever you are editing data by hand, you should BEGIN TRAN, edit your data, check that it looks good (for instance that you didn't delete more data than you were expecting) and then END TRAN. If you're using Postgres then you want to create lots of savepoints as well so that a typo doesn't wipe out your intermediate work.

But that said, in many applications it does make sense to have software mark records as invalid rather than deleting them. Add a last_modified date that is automatically updated, and you are all prepared to set up incremental updates into a data warehouse. Even if you don't have a data warehouse now, it never hurts to prepare for the future when preparing is cheap. Plus in the event of manual mistakes you still have the data, and can just find all of the records that got "deleted" when you made your mistake and fix them. (You should still use transactions though.)

+2  A: 

It looks like you're describing three cases here.

  1. Case 1 - maintenance scripts. Risk can be minimized by developing them and testing them in an environment other than your production box. For quick maintenance, do the maintenance in a single transaction, and check everything before committing. If you made a mistake, issue the rollback command. For more serious maintenance that you can't necessarily wait around for, or do in a single transaction, consider taking a backup directly before running the maintenance job, so that you can always restore back to the point before you ran your script if you encounter serious problems.

  2. Case 2 - SQL Injection. This is an architecture issue. Your application shouldn't pass SQL into the database, access should be controlled through packages / stored procedures / functions, and values that are going to come from the UI and be used in a DDL statement should be applied using bind variables, rather than by creating dynamic SQL by appending strings together.

  3. Case 3 - Regular batch jobs. These should have been tested before being deployed to production. If you delete too much, you have a bug, and are going to have to rely on your backup strategy.

Mike McAllister
A: 

except, if you want to "undo" a delete, how do you know which flags to unset?

Kevin
+1  A: 

Everyone has accidentally forgotten the WHERE clause on a DELETE query and blasted some un-backed up data once or twice.

No. I always prototype my DELETEs as SELECTs and only if the latter gives the results I want to delete change the statement before WHERE to a DELETE. This let's me inspect in any needed detail the rows I want to affect before doing anything.

David Schmitt