views:

175

answers:

3

Hey Everyone,

I’m working on a project and one of the requirements is document uploading and viewing. I have decided on storing the documents on the web server and not in the database. The question I have is this…what is a good approach to storing a large number of documents on the server? The uploaded documents will be associated with a master record ID in the database so I thought about creating a folder with the same date the record as the master and then prefacing the file name with the ID like this

2009-4-3
 234
  234-document.pdf
  234-courtrecord.pdf

Does this seem solid and intuitive? I added the ID to the file incase it ever gets moved out of the folder for any particular reason. Would there be any disadvantage to storing them this way?

Just looking for feedback and maybe a better solution.

A: 

What about a content management system like OpenCMS

Check this list for others.

John Ellinwood
Don't need a CMS for this application as i want to keep this simple and light weight.
chopps
What language and framework do you use? There are all levels of CMS from simple libraries to full applications. Something out there may make your system extensible in the future but still stay out of your way.
John Ellinwood
im using asp.net.
chopps
+1  A: 

Is there any value in having the files stored in separate, dated folders as opposed to a large repository? I would recommend using a single directory repository and having a general file handler. Overtime it will prove to be useful.

The record ID prepending the file name is a good idea if it's relevant to the file. I've often prepended a timestamp.

We've built several file-storage systems using native FTP functions within the language (PHP) and use the timestamp prepend method and it's proven to be very effective.

jerebear
One reason for the file folders would be for file viewing of the directory whether from a webpage of FTP app. If they where all dumped into a single directory loading all the files could time out a server, etc.
chopps
It could, you're right. I think most FTP clients tend to stop after about 4000. I'm thinking mostly just the ease of finding the files programmatically. If they're in mutliple directories, especially if they get re-uploaded, could be cumbersome.
jerebear
+1  A: 

I would utilize a flat file system, naming the files via some (generated?) label and record id combination. No need for folders. If you want implement versioning, you could datetime stamp. I've seen a couple systems built this way that work out just fine.

If you do choose to use a file folder system, be very pragmatic with the implementation of your folder naming, placement. You could also combine the two options.

Personally, I would separate your storage process/implementation from your application/access processes and not let one decision drive the other. Use application logic to access the files. Store them as efficiently as possible.

For example, you could store your files centrally however you want and then use a semantic layer to find your files multiple ways (name, topic, type, etc.). Think of how a wiki system works.

Joshua