views:

266

answers:

7

I am creating an application where I want to upload huge files. Following is a little description of what this application tries to achieve:

  1. Create a vmdk file from user's physical machine(using Vmware Converter tool, vmdk files can be GBs in size).
  2. Upload this vmdk file to the remote server
  3. now the purpose of having vmdk file on remote server is accessibility.
  4. i.e. a user away from his physical machine, can later on login via a webconsole, and instantiate a virtual machine from this vmdk on the remote server

I think this makes the situation different from normal file uploads(10-20MB file uploads).

rsync/scp/sftp might help, but.. Would this be possible using a web interface?

If not, then do I need to create a separate client for the end user, to convert and upload his files efficiently?

Any help is appreciated..

+5  A: 

Use a file transfer protocol for this, not HTTP. You need a protocol that can restart the transfer in the middle in case the connection breaks.


BTW, I don't mean to use FTP.


I'm not an expert on all the current file transfer protocols (I've been an FTP expert, which is why I recommend against it).

However, in this situation, I think you're off-base in assuming you need transparency. All the users of this system will already have the VMWare Converter software on their machine. I see no reason they couldn't also have a small program of yours that will do the actual upload. If there's an API to the Converter software, then your program could automate the entire process - they'd run your program before they go home for the night, your program would convert to the vmdk, then upload it.

Exactly which protocol to use, I don't know. That might take some experimentation. However, if the use of the protocol is embedded within your small application and in the service, then your users will not need to know which protocols you're experimenting with. You'll be able to change them as you learn more, especially if you distribute your small program in a form that allows auto-update.

John Saunders
Except I'd recommend SFTP/SCP, not FTP. FTP is a pain to setup firewall for. In addition, SCP compress files on the fly.
Vladimir Dyuzhev
@Vladimir: thanks. I've clarified that I don't mean FTP.
John Saunders
but is it possible to use rsync/sftp/scp transparently underneath the web page for uploading the files?
Aman Jain
@Aman: transparency is nice. Getting your files uploaded in a reasonable amount of time is much nicer.
John Saunders
@John: I've added more details now, please advise.
Aman Jain
Aman prefers a web interface John. There are several uploaders available that do this through a web page but don't necessarily use HTTP POST to achieve it.
Matt H
+1  A: 

Rsync would be ideal if you can find a host that supports it.
It can restart easily, retransfer only changed parts of a file if that's useful to you and has built in options to use ssh, compression etc.

It can also confirm that the remote copy matches the local file without transferring very much data

Martin Beckett
but is it possible to use rsync/sftp/scp transparently underneath the web page for uploading the files?
Aman Jain
You are writing a local app (on the client machine?) and you have control of the server - so just install and launch rsync and have an rsync server on the other end. You can tell it to use port 80 if this what you mean by under the web
Martin Beckett
A: 

I would run parallel FTP streams to speed up the process....

Mikos
That's a joke, right?
P Daddy
-1: I'm assuming it's not a joke. If you mean it, then you need to elaborate so I can stop laughing / gagging.
John Saunders
You may be laughing, but I have actually done this (and I'm still laughing about it). For some reason FileZilla Server was throttling individual connections.
jleedev
The best attack vector then is learning how to configure your software, not parallelizing the operation.
mnemosyn
Actually, we got the client to launch an SFTP server instead.
jleedev
I doubt you have, jleedev...
Coronatus
@P- why is this a "joke"? This is bit-torrent in reverse - I have done it and it works fab. @John - you might need to take an anti-nausea meds
Mikos
Bittorrent works by uploading to multiple computers at once. If a file transfer to a *single* server speeds up when you parallelize it, something is dreadfully wrong.
jleedev
@Mikos: the idea of using multiple unreliable FTP links, in parallel, is interesting to say the least.
John Saunders
@John, now that you (hopefully) have had some anti-nausea meds...here is something that might be of interest to you http://bit.ly/bIld5OAnd bring you to your senses (plus get you off your high-horse).Suggest you ack the idiocy of your position.
Mikos
@jleedev - try before you cry. :-)
Mikos
@Mikos: why would that be interesting? I know FTP too well to consider that suitable for uploading 10GB files over possibly unreliable links. FTP just wasn't designed for that.
John Saunders
A: 

You are aware how long such an upload would take with most upstream connections, even without worrying about which protocol to use?

Konrad Neuwirth
A: 

Try one of these solutions.

Or if you're keen roll your own. It'd be nice to have a flash based rsync client. You'd then run an rsync server on the webhost and upload with the rsync client downloaded with the browser.

Matt H
+2  A: 

If you insist on using a web interface for this, the only way to pull it off is with something similar to a signed Java applet (I can't speak to Flash or other similar technologies, but I'm sure they're similarly capable).

Once you've crossed this threshold of going to an applet-like control, then you have far more freedom about what and how you can do things.

There's nothing wrong with HTTP per se for uploading files, it's just that the generic browser is a crummy client for it (no restartability, as mentioned, is but one limitation).

But with an applet you can select any protocol you want, you can throttle uploads so as to not saturate the clients connection, you can restart, send pieces, do checksums, whatever.

You don't need an entire webpage devoted to this, it can be a small component. It can even be an invisible component (and fired via JS). But the key factor is that it has to be a SIGNED component. An unsigned component can't interact with the users file system, so you'll need to get the component signed. It can be your own cert, etc. It follows much of the similar mechanics as normal web certificates.

Obviously the client browser will need to support your applet tech as well.

Will Hartung
A: 

I asked similar question before and didn't get satisfactory answer using a web interface uploader.

Youtube allows users to upload 2GB file. So my question is how does Youtube/Google handle 2GB of file upload from its millions of users?

http://stackoverflow.com/questions/2497625/what-is-the-best-way-to-implement-a-big-1gb-or-more-file-uploader-website-in-ph

Cory
@Cory: there are tools that will layer on top of HTTP to try to get a more reliable upload. However, they cost a lot of money, and they're worth it if you need to do this with HTTP. I include them when I mention other file transfer solutions, since they're not straight HTTP. Still, consider the difference between 2GB and 10s of GB.
John Saunders