views:

20

answers:

3

i am working on a online file management project.In which we are storing references on the database(sql server) and files data on the on file system;.In which we are facing a problem of coordination between file system and database while we are uploading a file and also in case of deleting a file that first we create a reference in the data base or store files on file system;;the problem is that if create a reference in the database first and then storing a file on file system.bur while storing files on the file system any type of error occur.then reference for that file is created in the database but no file data on the file system;; please give me some solution how to deal with such situation;;i am badly in need of it;; and reason for that?

A: 

You need a transaction coordinator that supports both database and file system transactions in order to get two phase commit like that to work.

You did not specify database, programming language or platform, so this is as much as I can put in an answer.

Oded
A: 

In Windows Vista, Windows Server 2008, or later Windows OS, you can use transactions to govern access to the NTFS.

Using this facility, if you were to program in .NET, you could use the System.Transactions namespace to perform an update to the filesystem, and an update to the database, as one atomic unit.

I don't know if there are transactional filesystems on other OS. That doesn't mean they don't exist.

Cheeso
A: 

This actually a little easier than you think it is.

First, you need to decide the "single source of truth".

That is, either the file system or the DB is correct at any given point in time, which one is it?

The reason for this is that it makes it easier to resolve conflicts.

You should assume that the database is your Source, and that the file system is a shadow of the database. This seems counter intuitive, since, how can an entry exist in the DB if it's not in the file system. Obviously it can't. But, basically, if the file isn't in the DB, then "it doesn't exist" anyway. So, the file system reflects the state of the DB, not the other way around.

Given these assumptions, you end up with these conflict resolution rules.

For any given file:

File Exists    DB Entry Exists   Action
   Yes            Yes            No action, normal state
   No             Yes            Error -- missing file, "should never happen"
   No             No             No action, normal state
   Yes            No             Delete the file, but no error.

When uploading files, there's exists a grey area -- i.e. when a file is uploaded but not yet acknowledged by the DB.

The way to solve this you need to upload the file in a staging mode.

An easy way to do this is to upload the file to a different directory, but on the same physical file system, or to upload it to the final place using a temporary file name. Either way, the file is easily identifiable as being "in process" by it's name or location.

You want to have this file "staged" on the same file system for two reasons. One, disk space. If the disk doesn't fill up when you upload, then you KNOW it's going to fit in its final resting place (it's already "reserved" the space). Two, when you finally place the file, that operation must be atomic. File rename operations on the same filesystem, are atomic on modern filesystems. Basically, you can't have the file "half way renamed", even if it inevitably "overwrites" an existing file (the original is deleted during the rename operation).

Once staged, your operation becomes:

Start DB transaction
Rename file
Add DB record
Commit transaction

If the rename file action fails, you abort and roll back the DB transaction, thus the entry. If the rename succeeds, and the DB fails? Then you have State #4, listed above. Retry the upload until it succeeds.

To delete a file, do this:

Start DB Transaction
Delete DB record
Commit transaction
Delete file from file system

If the DB delete fails, you don't delete the file. If the DB delete succeeds, and the file deletion fails, then we're back to State #4.

Finally, you have a reaper process that regularly (daily, weekly, whatever) compares the DB to the file system, deleting any files that are not in the database. Since the DB is the "Single source of truth", the two stores will eventually be in sync.

If a file goes missing that has a DB record, then you have "data corruption". Don't do that. It's a bug, or someone is walking over your file system.

The retry characteristics of the upload process and the fast fail of the delete process gives you a pseudo two phase commit process that easy to check what's right and wrong, and easy to correct to the proper state.

Will Hartung