views:

36

answers:

3

I have a MySQL table which basically serves as a file index. The primary key of each record is also the name of a file in a directory on my web host.

When the user wants to delete a file from the system, I want to ensure some kind of transaction safety, i.e. if something goes wrong while deleting the file the record is not erased, and if for some reason the database server dies the file won't be erased. Either event occurring would be very unlikely, but if there's even the slightest chance of a problem I want to prevent it.

Unfortunately I have absolutely no idea how to implement this. Would I need to work out which is less likely to fail, and simply assume that it never will? Are there any known best practices for this?

Oh and here's the kicker - my web host only supports MyISAM tables, so no MySQL transactions for me.

In case it matters, I'm using PHP as my server-side scripting language.

+2  A: 

In the circumstances, I think I'd use a logical deletion mechanism where a record is marked as deleted even though it is still present in the database, and the file is still present in the file system. I might move the file to a new location when it is logically deleted (think 'recycle bin') and update the record with the new location as well as the 'logically deleted' marker.

Then, sometime later, you could do an offline scavenge operation to physically delete files and records marked as logically deleted.

This reduces the risk to the live data. It slightly complicates the SQL, but a view might work - rename the main table, then create a view with the same name as the main table used to have, but with criteria that eliminate logically deleted records:

CREATE VIEW MainTable(...) AS
    SELECT * FROM RenamedTable WHERE DeleteFlag = 'N';

Even upgrading to a company that provides MySQL transactions is not a huge help. You would need a transaction manager which can run Two-Phase Commit protocols between the file system and the DBMS, which is non-trivial.

Jonathan Leffler
Very nice suggestion. I was expecting to have much more refactoring work ahead of me than that!
robinjam
+1  A: 

You can create a Status column (or an "is_active" column) in the File table with two values: 0=Active, 1=Deleted.

  • When a user deletes a file, only the Status field is changed and the file remain intact.
  • When a user browses files, only files with Status=0 are shown.
  • The Administrator can view/delete files with Status=1.
Kristoffer Bohmann
+2  A: 

Whether the file is "Deleted" from the DB via a UPDATE or a DELETE of a row, the problem is the same -- the database + file operations are not atomic. Neither an UPDATE or a DELETE are safer than the other, they're both transactions in a database whereas the file operation is not.

The solution is that there is never any conflict as to the state of the data. Only one source is considered "the truth" and the other reflects that truth. That way if there's ever an inconsistency between the two, you know what the "truth" is. In fact, there is never a "logical" inconsistency, only the aftermath manifested by physical artifacts on the disk.

In most cases, the Database is a better representation of The Truth.

Here's the truth table:

File Exists -- DB Record exists -- Truth
    Yes             No             File does not exist
    Yes             Yes            File does exist
    No              Yes            File does exist, but its in error.
    No              No             File does not exist

Operationally, here's how this works.

To create a file, copy the file to the final destination, then make an entry in the DB.

If the file copy fails, you don't update the DB. If the file copy succeeds, but the DB is not updated, the file "does not exist", so back to step one. If the file copy succeeds and the DB update succeeds, then everything is A-OK

To delete a file, first update the DB to show the file is deleted. If the DB update succeeds, then delete the actual file. If the DB update does not, then do not delete the file. If the file delete fails, no problem -- the file is still "deleted" because the DB says so.

If you follow the work flow, there's "no way" that the file should be missing while the DB says it exists. If the file goes missing, you have an undefined state, that you will need to resolve. But this shouldn't happen barring someone walking on your file system.

The DB transactions help keep things honest.

Occasionally, as Jonathan mentioned, you should run some kind of scavenging, syncing process to make sure there aren't any rogue files. but even then, that's really not an issue save for file space, especially if the file names of the actual files have nothing to do with the original file names. (i.e. they're synthetic files names) That way you don't have to worry about overwrites etc.

Will Hartung
The file names are generated from the primary key in the table specifically to prevent clashes. Thank you for the truth table - it made me consider the problem from a different perspective which is always helpful.
robinjam
Incidentally, I thought MySQL guaranteed transaction safety for single operations such as UPDATE and DELETE, i.e. either it works or it returns an error. Was I mistaken?
robinjam
No, you're correct. That's how this can work, because you always "know" the state of the database (since even on failure it will rollback), so it's a more reliable "source of truth". I simply point out there is no distinction between an update and a delete -- they're both equally safe as a means of recording the file state, and that's true because of the transactional nature of the database.
Will Hartung