views:

1077

answers:

6

Let's say you want to calculate the remaining download time, and you have all the information needed, that is: File size, dl'ed size, size left, time elapsed, momentary dl speed, etc'. How would you calculate the remaining dl time?

Ofcourse, the straightforward way would be either: size left/momentary dl speed, or: (time elapsed/dl'ed size)*size left. Only that the first would be subject to deviations in the momentary speed, and the latter wouldn't adapt well to altering speeds.

Must be some smarter way to do that, right? Take a look at the pirated software and music you currently download with uTorrent. It's easy to notice that it does more than the simple calculation mentioned before. Actually, I notices that sometimes when the dl speed drops, the time remaining also drops for a couple of moments until it readjusts.

+1  A: 

What you could do also is keep track of your average speed and show a calculation of that as well.

Ólafur Waage
Correct me if I'm wrong, but that's just the same as the second option that I mentioned.
Meat
+2  A: 

I think it's just an averaging algorithm. It averages the rate over a few seconds.

CookieOfFortune
+10  A: 

Well, as you said, using the absolutely current download speed isn't a great method, because it tends to fluctuate. However, something like an overall average isn't a great idea either, because there may be large fluctuations there as well.

Consider if I start downloading a file at the same time as 9 others. I'm only getting 10% of my normal speed, but halfway through the file, the other 9 finish. Now I'm downloading at 10x the speed I started at. My original 10% speed shouldn't be a factor in the "how much time is left?" calculation any more.

Personally, I'd probably take an average over the last 30 seconds or so, and use that. That should do calculations based on recent speed, without fluctuating wildly. 30 seconds may not be the right amount, it would take some experimentation to figure out a good amount.

Another option would be to set a sort of "fluctuation threshold", where you don't do any recalculation until the speed changes by more than that threshold. For example (random number, again, would require experimentation), you could set the threshold at 10%. Then, if you're downloading at 100kb/s, you don't recalculate the remaining time until the download speed changes to either below 90kb/s or 110kb/s. If one of those changes happens, the time is recalculated and a new threshold is set.

Chad Birch
The moving average that you and the others mentioned seems to be a step in the right direction, and by my instincts would be better than the tresshold method (ofcourse, they can be combined). I'm sure there are even better ways, though. I would guess that a speed histogram would be a greadt deal of information, just thinking out lowd here.
Meat
+6  A: 

The obvious way would be something in between, you need a 'moving average' of the download speed.

Henk Holterman
+1. AKA low-pass-filter.
Paul
A: 

EDIT: Here's what I finally suggest, I tried it and it provides quite satisfying results:

I have a zero initialized array for each download speed between 0 - 500 kB/s (could be higher if you expect such speeds) in 1 kB/s steps. I sample the download speed momentarily (every second is a good interval), and increment the coresponding array item by one. Now I know how many seconds I have spent downloading the file at each speed. The sum of all these values is the elapsed time (in seconds). The sum of these values multiplied by the corresponding speed is the size downloaded so far. If I take the ratio between each value in the array and the elapsed time, assuming the speed variation pattern stabalizes, I can form a formula to predict the time each size will take. This size in this case, is the size remaining. That's what I do: I take the sum of each array item value multiplied by the corresponding speed (the index) and divided by the elapsed time. Then I divide the size left by this value, and that's the time left.

Takes a few seconds to stabalize, and then works preety damn well.

Note that this is a "complicated" average, so the method of discarding older values (moving average) might improve it even further.

Meat
You could add a statistical analysis, say: "Your download has a 95% of completing in 5 hours".
CookieOfFortune
More like: "95% of the time you're downloading at a constant speed that would have produced a 5 hours download". I don't think any user would appriciate such a prediction.
Meat
No no, I meant a more complicated solution based upon an estimation of the statistical distribution of the user's download speed. You can calculate the distribution and then determine the most likely outcome. (It is possible this can provide a more accurate solution than just average, but averaging is already pretty good).
CookieOfFortune
+3  A: 

You could use an averaging algorithm where the old values decay linearly. If S_n is the speed at time n and A_{n-1} is the average at time n-1, then define your average speed as follows.

A_1 = S_1
A_2 = (S_1 + S_2)/2
A_n = S_n/(n-1) + A_{n-1}(1-1/(n-1))

In English, this means that the longer in the past a measurement occurred, the less it matters because its importance has decayed.

Compare this to the normal averaging algorithm: A_n = S_n/n + A_{n-1}(1-1/n)

You could also have it geometrically decay, which would weight the most recent speeds very heavily: A_n = S_n/2 + A_{n-1}/2

If the speeds are 4,3,5,6 then A_4 = 4.5 (simple average)
A_4 = 4.75 (linear decay)
A_4 = 5.125 (geometric decay)

Drew Hoskins
That's interesting. Put in mind that the moving average example that many people suggested here is just a private case of this method: Applying a window function over the average. While the function in the "moving average" example is simply a pulse function, that one is a more complex one. Assuming that the speed deviations don't follow a certain logic, it's interesting which method yeilds better prediction.
Meat
One advantage to this approach is that you need not keep around a list of values, whereas with the moving average approach, you do.This approach will produce a more smoothed-out curve than moving average, which will react more violently as outliers are dropped from the average.
Drew Hoskins
I like your answer. Could you perhaps provide some pseudo-code?
Magnus Johansson