views:

177

answers:

5

I accept file uploads from users. Each file has a pointer in the db which has info on the file location in the filesystem. Currently, I'm storing the files in the filesystem non categorically, and each file is currently just named a unique value. All categorisation and naming etc is done in the app using the db.

A factor that I'm concerned about is that of file synchronization issues. If I wanted to set up file system synchronization where, for example, the user's files are automatically updated by bridging with a pc app, would this system still work well? I have no idea how such a system would work so hopefully I can get some input.

Basically, is representing a file's name and location purely in the database optimal, especially if said file may be synchronized with a pc application?

+2  A: 

All you need to make such a system work is to make sure the API you use (or, more likely, create) can talk to the database and to the filesystem in a sensible way. Since this is what your site is already doing anyway, it shoudn't be hard to implement.

The mere fact that your files are given identifiers instead of plain-English names is mostly irrelevant with regard to remote synchronization.

msanford
+1  A: 

Store a file hash in the database rather than a path (i.e. SHA1) and have a separate database connect the hash with the path. Write a small app that will synchronize the hash database so that when you move your files to a different location it'll be easy to build a new database with updated paths.

That way you can also have the system load the file from a different location depending of which hash database you use to locate the file so it offers some transparency if you need people to be able to access the same file from diverse locations (i.e. nfs or webdav).

David Holm
+5  A: 

Yes, the way you are doing this is the best way to do it. You are using a file system to store files and a database to sore structured data.

One suggestion I would make is that you create a directory tree on the file system. You may one day run up against a maximum files per directory limitation of your file system. I have built systems that create a new sub directory for each day or week.

Make sure you have good backups of the database as well as the document repository.

Jim Blizard
A: 

The Boring Answer™:

I think it depends on what you wanna do, as always :)

I mean take your regular web hosting company. Developers are synching files to web servers all the time. Would it make sense for a web server to store hash-generated file names in a db that pointed to physical files? No. Then you couldn't log in with your FTP-client and upload files like that, and you'd have to code a custom module to get Apache to work etc. Instant headache.

Does it make sense for Flickr to use a db? Yes, absolutely! (Then again, you can't log in with an FTP-client and manage your photos—and that's probably a good thing!)

Just remember, a file system is a (very simple) db too. And it's a db that comes with a lot of useful free tools.

my 2¢

/0
0scar
+1  A: 

We use exactly this model for file storage, along with (shameless plug) SabreDAV to make it seem to the end-user it's a normal filesystem.

I think this is a perfectly fine model, as long as looking up the file is documented and easily retrieved there shouldn't be an issue. Just make backups of your DB :)

One other advice I can give, we use an md5() on the file-id to generate a unique filename. We use parts of the files to generate a directory structure, for example.. id 1 will yield: b026324c6904b2a9cb4b88d6d61c81d1, the resulting filename will become:

b02/632/4c6/904b2a9cb4b88d6d61c81d1 The reason for this is that most stable filesystems can become very slow after a high number of files (or directories) in one directory. It's much, much faster too traverse a few sub-directories.

Evert