views:

581

answers:

7

I've an upcoming project where I will need to handle very large uploads from browsers (either the classic input type="file" or a Java Applet), and looking for the best tool to do the job on the server.

Theses are the things I need :

  • low memory consumption on the server
  • ability to save the file in its final destination on the server (no copying the file around)
  • no blocking of other important tasks done by the webserver
  • good handling of files up to 2 gb
  • authorization of files (permissions would be given in the app)

I still have some latitude on what technology to use so I would like to have some advice in order to be able to choose the best technology on the server to handle this task :

  • ASP.NET ?
  • Java ?
  • Amazon S3 ?
  • Other choices ?

I'm more used to the Microsoft Stack, but willing to change if necessary : as told above, I'm just looking for the best tool for the job.

Thanks !

Update : The server side is the part I'm really interested in for this question, not the client side.

It looks like it may be trivial, but when you start to digg a bit you see 4 Mb limitations with .NET, downloads that use a lot of memory, that CAN block other threads (when you have a limit on the number of threads, and if a thread can execute for the duration of 2 Gb file upload/download over the internet : well this ain't gonna scale very well, will it ?), etc.

A: 

You can do it using Asynchronous file upload in asp.net. With flash to present it to the user.

http://swfupload.org/node/47

Pino
A: 

You could use uploadify. I have used this several times before and it has always suited my needs. It is an asynchronous file uploader that uses flash to allow for multiple files to be uploaded all at once if desired.

jmein
A: 

You can use BITS for uploading:

http://www.simple-talk.com/dotnet/.net-tools/using-bits-to-upload-files-with-.net/

kirkus
note: limiting yourself to Windows-only *clients*; with the growing Mac popularity, this may be an issue.
Piskvor
+4  A: 

You'll need:

  • Client-side code (Java Applet, Silverlight, etc) to break files in small chunks
  • Server-side code (ASP.NET, Java, doesn't matter) to build those files back

I just finished an application exactly like that; I'd use Silverlight (WebRequest async), ASP.NET (IHttpHandler/IHttpAsyncHandler) and SQL Server 2005 (UPDATETEXT/READTEXT) for file storage.

UPDATE: About ASP.NET server-side code:

ASP.NET default config will allow 100 threads per processor; IHttpAsyncHandler won't block your process and there you can write your file content directly to context.Response.OutputStream.

For upload, you'll also send several data chunks, but in multiple HTTP connections; while this can bring some overheat due HTTP headers, works very well in my tests.

Rubens Farias
I have no choice but to support also the input type="file" (tight security at some customers', weird proxies, etc...) as a last resort option. So the server-side code must be able to handle a 2 gb file coming in one piece. Will the IHttpHandler.IHttpAsyncHandler be able to handle this ?
Sébastien Nussbaumer
I use WebClient (HTTP) at Silverlight application, so I don't need more open ports, but I agree install that plugin can be a problem.
Rubens Farias
About 2Gb file: HTTP works receiving all data before start processing; so, once you have 2Gb memory available on server for EVERY concurrent user, you'll be ok =) You'll also need to change your web.config `<httpRuntime maxRequestLength="" />`setting
Rubens Farias
bottom line: input type="file" won't work for 2Gb uploads
Rubens Farias
OK, I remember that I could upload files up to 2 Gb in classic asp with input type="file" using SoftArtisans FileUp component on the server. But it looks as though there has been no update for 5 years for the component, no .NET specific version, it's still com, 32 bits ... If I could do without, it would be a big +
Sébastien Nussbaumer
And it would stream the file directly to disk, no putting the file completely in RAM
Sébastien Nussbaumer
A: 

On the client side, the input type="file" through HTTP POST has its shortcomings - notably, it's unable to compress uploads (probably not an issue) nor can it resume transfers (this can be painful when 1000 MB upload fails at 990 MB). SWFUpload, although it's great in other aspects, relies on the browser's HTTP POST implementation.

I'd probably go with a Java applet on the client - this would allow to establish the connection and check for necessary permissions prior to uploading; although that path has its problems too:

  • FS access permissions (signed applet?)
  • writing your own HTTP uploader
  • proxy handling

also give an option to fall back to plain old HTTP POST too.

Server-side can be written in pretty much anything, as long as you can process the data as they arrive (i.e. don't wait until you have the whole file).

Piskvor
+2  A: 
  • low memory consumption on the server

Write input directly to disk instead of to memory.
In Java terms, use FileOutputStream/BufferedOutputStream.

  • ability to save the file in its final destination on the server (no copying the file around)

See above.

  • no blocking of other important tasks done by the webserver

Each request runs in its own thread, so there's nothing to worry about. It just depends on how you code it all.

  • good handling of files up to 2 gb

Non-issue when writing file to disk directly.
In Java terms, you can use Apache Commons FileUpload API for this.

  • authorization of files (permissions would be given in the app)

Not sure which level of authorization you're talking about. Disk file system level? Webapplication level? Client level?

BalusC
Precision : Webapplication level, only authorized users on the website should be able to download the files
Sébastien Nussbaumer
In Java terms: implement a login/authorization mechanism and check if the user is logged in and authorized.
BalusC
A: 

A few other quick notes based on my experience...

  1. input type=file won't work reliably for large files at all (due to memory)
  2. Some file upload components will solve the memory issues, but they still have issues when bytes are lost in transmission coming from the client to server.
  3. You should look at a java applet that supports chunking of the data.

The way it should work is the java applet breaks the file into manageable chunks and creates a hash from the bytes. Once the server receives a chunk, it should compare a hash of the bytes received with the hash provided by the java applet.

If the hashes don't match, retry the chunk. If they do match, move on to next chunk. Then use a tool to bring all the chunks back together.

Brian
I used to compute hashes too but I drop it, freeing some client CPU: HTTP communication already have some correction mechanism, so you don't worry about it.
Rubens Farias