views:

401

answers:

4

Story: today one of our customers asked us if all the data he deleted in the program was not recoverable.

Aside scheduled backups, we shrink the log file once a day, and we use the DELETE command to remove records inside our tables where needed.

Though, just for the sake of it, I opened the .mdf file with an editor (used PSPad), and searched for a particular unique piece of data -I was sure- was inside one of tables.

Problem: I tracked it in the file, then executed the DELETE command, and it was still there.

Question: Is there a particular command we are not aware of to delete the records physically form the disk?

Note: we know there are particular techniques to recover lost data from the hard drives, but here I am talking about a notepad-wannabe!

+4  A: 

SQL Server just marks the space of deleted rows as available, but does not reorganize the database and does not zero out the freed up space. Try to "Shrink" the database, and the deleted rows should no longer be found.

Thanks, gbn, for your correction. A page is the allocation unit of the database, and shrinking a database only eliminates pages, but does not compact them. You'd have to delete all rows in a page in order to see them disappear after shrinking.

cdonner
If you can provide the "Shrink" database function name I'll vote it as answer.
Alex Bagnolini
Use ALTER INDEX. DBCC SHRINKFILE is never a good idea for general use
gbn
DBCC SHRINKFILE can compact pages into gaps caused by deallocated pages, but won't reclaim space in a page for deleted row
gbn
@Alex: Are you trying to thoroughly erase data for security purposes or are you hoping you can recover data that was accidentally deleted? From your original question, I understood it to be the latter; in which case you definitely don't want to shrink the database.
Craig Young
PS: I don't think shrinking the database is guaranteed to clear the data from the page. If the page is in use by any other records, it will likely be left as is (even if it is copied to a new location on disk). If you do have a security concern, you're probably better of encrypting any data that needs to be secured. Your client should be more worried about someone accessing your database engine than about accessing the physical mdf file.
Craig Young
I need to ERASE the data completely.
Alex Bagnolini
You may want to update the row with empty fields before you delete it.
cdonner
@cdonner: That may work, but only if the update can update the record 'in-place'; otherwise the old values may still exist somewhere in unallocated space on a data page. (Ready to be overwritten.) You're more likely to get an in-place update if your text fields get new values of the same length as the old values. Note also that the LDF file may still contain old values.
Craig Young
+5  A: 

The text may still be there, but SQL Server has no concept of that data having any structure or being available.

The "freed space" is simply deallocated: not removed, compacted or zeroed.

The "Instant File Initialization" feature relies on this too (not zeroing the entire MDF file) and previous disk data is still available eben for a brand new database:

Because the deleted disk content is overwritten only as new data is written to the files, the deleted content might be accessed by an unauthorized principal.

Edit: To reclaim space:

ALTER INDEX...WITH REBUILD is the best way

DBCC SHRINKFILE using NOTRUNCATE can compact pages into gaps caused by deallocated pages, but won't reclaim space in a page for deleted row

gbn
Is there a non-manual way to Remove, compact or zero the "freed space"?
Alex Bagnolini
@alex: http://sqlserver2000.databases.aspfaq.com/how-do-i-reclaim-space-in-sql-server.html
0A0D
Thank you for the edits, links, advices.
Alex Bagnolini
`ALTER INDEX` seems to be working, but only if I reboot the service a couple of times. Is there any parameter I am missing? I am using `ALTER INDEX ALL ON Table REBUILD`.
Alex Bagnolini
No, it should just work. However, you can only rebuild an index and compact rows with a clustered index.
gbn
Just rebuilding the index, clustered or not will not guarentee the ghosted data is removed - since he was checking the original MDF file, we are dependant on hitting a checkpoint and the dirty page in memory being written back to the disk. Even then we also need the ghost process to come along and nuke the ghosted record from the page, since it is still sitting there as a valid ghost record.
Andrew
@Andrew: the ALTER index is something we can control, so I'd say it's still the best way
gbn
+2  A: 

If your client is concerned about data security it should use Transparent Database Encryption. Even if you obliterate information from the table, the record is still in the log. Even when log is recycled, the info is still in the backups.

Remus Rusanu
A: 

You could update the record with dummy values before issuing the delete, thereby overwriting the data on disk before the database marks it as free. (Whether this also works with LOB fields would warrant investigation, though).

And of course, you'd still have the problem of logs and backups, but I take it you already solved those.

meriton