views:

65

answers:

4

I'm currently developing a web application whose primary user function is uploading and downloading of files. The files will be stored on the hard disk (no cloud storage yet).

Taking into consideration the possibilities of gigabytes of data and a large number of files, do I need to organize files into sub folders to account for the fetching of a file or is the file system's indexing already very efficient and I can ignore this potential bottle neck?

Update:

On a side note, I plan to store file names and any additional information in a SQL database and only query the disk when a user actually wants to download the file. This is how I plan on retrieving files:

FileStream stream = File.Open("C:\file.txt");
byte[] fileContent = new byte[stream.Length];
stream.Read(fileContent, 0, fileContent.Length;

Any file information will be retrieved from the database. The hard disk will only be used for saving and fetching files.

Update 2:

Files will be saved as GUID + EXTENSION on the hard disk while the actual file name is stored in the database.

+2  A: 

Yes you need to further subdivide the files to save time used for file enumeration in a directory, though how much savings you'll get with this method may depend on the O/S you're using. Windows is quite slow when you need to ask for a single file among hundreds in a folder. I believe this is because it will try to read all attributes for all files, if it has to search through them. Additionally, for this type of application, you may need to worry about file versions, file upload timeouts, files infected with virus, hiding the real file path from end users, unsupported mime types, etc.

Cahit
+2  A: 

Adding to what @cahitbox said, it goes further than that. If you expect more than a couple concurrent users, you should have multiple disks so that you can be retrieving multiple files concurrently (disks are slow).

BioBuckyBall
A: 

I think you also need to take into account following questions:

  • Will list of files be displayed to user, or user will operate with file using direct link to file
  • Will you need to do backups?
  • Are you going to use database for storing additional information, or you are going to use file system only?
  • Has your application any kind of security or permissions?
  • What performance must application have (number of concurrent reading, number of concurrent writing, response time, upload/download speed)?
  • Does you need any kind of search?
  • Does you need to store original file name?
STO
A: 

If the file "metadata" is stored in a database, you can just name the files with a GUID and their extension. The simplest way to give them back to the users is to store them directly inside you web application, so they are available through simple urls, if security constraints are not too tight :

http://my.web.site/files/cbacd260-10ec-4377-bd19-25daa1fd0fe2.pdf

If you really want to serve your files through and HttpHandler, I'd use

Response.TransmitFile( Server.MapPath("path/to/files/cbacd260-10ec-4377-bd19-25daa1fd0fe2.pdf" );

Documentation here : http://msdn.microsoft.com/en-us/library/12s31dhy%28VS.80%29.aspx

The expected number of users is also very important. 30 users a day aren't the same as 30 000. File volumetry is also important : you talk about gigabytes, but you won't manage 30 GB as you manage 300.

For the physical storage of files, try to avoid storing too many (2500+ in my opinion) files in the same directory. But usually, for file upload sites, you will have them "grouped" logically, so you can have a subdirectory.

mathieu
Thats I have it setup currently. Users upload files, the file name and extension is stored in the database but the files are saved with a GUID + extension on the hard disk.
Baddie