ansaurus

Question

Questions related to writing your own file downloader using multiple threads java

Answer 1

A:

1 Does file downloading becomes fast when multiple threads is used? In this code i am not able to see the benefit.

No. I would be very surprised if that was the case. The CPU would never have a problem of keeping up with the feeding the network-buffer.

2 How should i decide how many threads should i create ?

In my opinion, 0 extra threads.

4 The file which file receiver receives is valid and not corrupted but checksum (i used FileUtils of common-io) does not match. Whats the problem?

Make sure you don't accidentally rely on strings and specific encodings.

5 This code gives out of memory when used with large file(above 100 Mb) i.e. because byte array which is created. How can i avoid?

The obvious solution would be to read smaller chunks of the file. Have a look at the read method of DataInputStream

http://java.sun.com/j2se/1.4.2/docs/api/java/io/DataInputStream.html#read%28byte[],%20int,%20int%29

And, finally, some general pointers in the matter: Instead of using multiple threads for this kind of thing, I strongly encourage you to have a look at the java.nio package, specifically java.nio.channels and the Selector class.

EDIT: If you're really keen on getting it super-efficient, and have very large files, you could benefit from using UDP, and handle packet order and acknowledgements yourself. TCP does for instance guarantee that the packets received come in the same order as the packets sent. This is not something that you rely heavily on (since you could easily encode the "byte-offset" for each datagram yourself) and thus don't need to "pay" for.

aioobe 2010-04-29 11:44:42

actually, i have taken an example from Download Accelerator, it makes 5 connections to get a file and file gets downloaded faster.Suppose, i have 5 gb of file, should i let one thread do that?

Shekhar 2010-04-29 11:49:06

I don't know the details of Download Accelerator, but I suspect that it possibly gains speed by "taking" speed from other downloading clients. If it downloads over HTTP it could gain some speed when downloading a large number of small files. It could then perform the handshaking while downloading another file. That is, it could eliminate the startup-latency for each individual file, but if you have a 5 gb file, I really doubt you'll get a speed boost by throwing more threads at the task.

aioobe 2010-04-29 11:59:49

@aioobe see my answer, AIUI it's more to do with bypassing TCP fairness by adding more connections. (I guess that's what you were trying to say with the first sentence?). Avoiding connection startup costs for lots of small files can be done by http pipelining, but I don't think download accelerators bother.

wds 2010-04-29 12:08:31

Answer 2

+1 A:

There's a bunch of questions here to answer. I'm not going to go through all of the code, but I can give you some tips.

First off, what some download accelerators do is indeed using the HTTP Range header to download parts of a file in parallel. Why does this work? TCP tries to allocate bandwidth fairly per connection. So if you're downloading a file from a server whose bandwidth is swamped, then you can receive a bigger share of the bandwidth by adding more connections. The same principle applies to servers that restrict outgoing bandwidth, which is usually also applied per connection (sometimes taking the IP into consideration).

Obviously if everybody was doing this, we'd be left with a whole lot of TCP connections and their overhead, and not a lot of bandwidth to do the actual downloading, which is why even these download accelerators will only use 2-4 connections. Moreover, if you are the one writing the server, you really don't need to worry about this, as you will only be slowing yourself down (by adding more overhead).

Going out of memory: don't use a bytearray, use a (buffered) InputStream (or if you have some time, learn how to use java.nio and the byte buffers) and read chunks as you are sending the file. The java tutorials cover all the basics.

wds 2010-04-29 12:05:03

@wds; would you agree that UDP could do better than TCP in a scenario like this?

aioobe 2010-04-29 12:35:28

@aioobe: on a reliable network, you end up saturating your bandwidth either way. Perhaps you manage to do better than TCP, but not a lot (couple percent?), and at the cost of implementing a bunch of stuff yourself (i.e. you have to monitor link characteristics, know when to resend, provide your own checksums...). effort/benefit I don't think it's a good idea.

wds 2010-04-29 13:10:11

Answer 3

+1 A:

1) Another reason why multiple connections may be faster is related to TCP window size.

throughput <= window size / roundtrip time

See http://en.wikipedia.org/wiki/TCP_tuning#Window_size for details.

You wont see that much difference if you run tests on a local network, because roundtrip time is small enough.

2) The only way to know for sure is to try. And the right number of threads will depend on environnment. If you need to download really big files, it might be worth it to first run a small calibration program that will try to download with different number of threads.

3) I havent looked there for a long time, but Azureus (now called Vuze) has a pretty complete API to download anything from torrent files to FTP ... And they probably have a quite efficient implementation...

Good luck !

Edit (clarification on window size) :

What you are trying to do is maximize throughput (download files faster). There is not much you can do about roundtime trip, it depends on the network. What you can do is increase window size. The window size is automagically adjusted (there is plenty of documentation on this, but I'm too lazy to google it) to best fit the current state of the network. Basically a larger window means better throughput as long as there isnt congestion or packet loss.

In the best case, you will get a window size of 64Kbits, at this point, unless you use some tricks (Jumbo frame / window scaling) which are not cupported by all routers on the internet, you get stuck at a maximum throughput of :

throughput >= 64Kbit / roundtrip time

As you cant get a bigger window, you have to open multiple windows to get around this limitation.

Notes :

As aioobe said, UDP isnt subject to the same limitations, this is one of the reason why it is more efficient.
A very efficient and scalable protocol to distribute large files is Bittorrent. As long as you dont need authentication / authorization of the downloads, it might work for you. And if you do need authorization, you can always encrypt the files ...

Guillaume 2010-04-29 12:34:00

Would you care to elaborate on the implications of your equation?

aioobe 2010-04-29 13:05:01

Answer 4

A:

Don't read huge file chunks into memory. No wonder you're running out. Just seek to the required position in the file and start copying via a sensibly sized buffer:

int count;
byte[] buffer = new byte[8192];
// or whatever takes your fancy, but sizes > the socket send buffer size are pointless
while ((count = in.read(buffer)) > 0)
  out.write(buffer, 0, count);
out.close();
in.close();

Same logic can be used at both ends - when writing the file at the receiver, use a RandomAccessFile and seek to the appropriate offset before starting this loop.

However as other respondents have noted, the client's requirement is really pretty pointless. It doesn't buy anything much except expense and risk. I would just stream the file via a single connection.

What you should do is set a large socket send and receive buffers at both ends, e.g. 60k. The default is 8k on Windows which is uselessly low.

EJP 2010-04-30 01:09:40

ansaurus

tags:

views:

answers:

Questions related to writing your own file downloader using multiple threads java

related questions