views:

872

answers:

6

I am looking for the best way to calculate ETA of an operation (IE: file download) using a linear progress information.

Lets say that I have the following method that gets called:

void ReportProgress(double position, double total)
{
    ...
}

I have a couple of ideas:

  • calculate the progress in a set amount of time (like last 10s) and use that speed as an average speed for the operation
  • keep a set of the last x progresses that has been reported, calculate the speed of each increment and use the average
A: 

It will depend on how consistent the operation timing is. If it's consistent, it would be perfectly reasonable to use the average time of previous operations. If it's not, you're better off timing the current operation and extrapolating.

Edit: If the operation is inconsistent from previous runs, and also inconsistent from start to finish, then you have an unsolveable problem. Predicting the unpredictable is always fun :)

You might decide ahead of time if you want to underestimate or overestimate, and add a fudge factor to the estimate. For example, if you want to overestimate, and the first 10% takes 6 seconds, you might extrapolate to 60 seconds then multiply by 1.5 to get a total estimate of 90 seconds. As the percentage complete grows, decrease the fudge factor until at 100% it becomes 1.0.

Mark Ransom
Lets consider a file download, the real speed could change a lot and depending on variables I cannot control. The ETA should seem reasonable to the end user. (maybe I should clarify the question better)
Maghis
+2  A: 

Something like this should do the trick:

void ReportProgress(double position, double total)
{
    static TimeType startTime;

    if (position == 0)
    {
        startTime = GetTime();
        return; // to avoid a divide-by-zero error
    }

    TimeType elapsedTime = GetTime() - startTime;
    TimeType estimatedRemaining = elapsedTime * total / position;
    TimeType estimatedEndTime = GetTime() + estimatedRemaining;

    // Print the results here
}

The estimate gets closer to the truth as the progress approaches 100%

e.James
Basic but good one, maybe you are assuming the operation timing is very consistent.While progressing the estimated time is always slower converging to the "real remaining time".IE: in the download case, another download could finish and my download speed will double, at this point the estimated time will take a lot to get accurate.
Maghis
Absolutely true, and that's the trouble with estimates. It is equally possible that another download will start, causing your download speed to be cut in half.
e.James
+4  A: 

I actually despise both those ideas because they've both bitten me before as a developer.

The first doesn't take into account the situation where the operation actually gets faster, it says that there's 10 minutes to go and I come back after 3 and it's finished.

The second doesn't take into account the operation getting slower - I think Windows Explorer must use this method since it always seems to take 90% of the time copying 90% of the files, then another 90% of the time copying that last 10% of the files :-).

I've long since taken to calculating both those figures and averaging them. The clients don't care (they didn't really care about the other two option either, they just want to see some progress) but it makes me feel better, and that's really all I care about at the end of the day ;-)

paxdiablo
I know, customers rarely notice, but as a software developer I am almost obsessive with this kind of things:). Averaging them is a nice idea.
Maghis
+1  A: 

If you're wanting an ETA rather than a 'progress bar' then can you supply more than one figure?

Calculate the average download speed over a set period of time (depending on how long the overall download is likely to last, if you're looking at 10+ minutes then every 5s or so would be ok) and keep a record of the averages.

Then you can provide two figures, an upper and lower estimate.

If you're confident that the averages are going to be a good indication of the total time to download, then you could display the 40th percentile and the 60th - if the average download times vary widely then the 10th and 90th might be better.

I'd rather see a ballpark '21-30 minutes' and it be accurate than be told 29 min 35.2 seconds and it be miles out, and varying wildly from one update to the next.

Stringent Software
+2  A: 

I think that this problem is pretty much unsolvable, but it is possible to create some accurate estimations with a bit more knowledge of the process that is executing. And in the cases where there are large unknowns it is better to inform the user of those unknowns so that they can take them into account.

To take the simple example of downloading a batch of files you have two known variables:

  • The number of files
  • The size of the files

For each file there is a constant overhead (the time it takes to establish a connection, and the time it takes to open a file on the file system). There is also the obvious download time associated with the size of the files. Creating a function that can express this as time remaining in terms of the current download speed is easy, and accurate provided the downlaod speed doesnt fluctuate too much. But there lies the problem.

With an accurate model of the operation you are performing it is easy to predict how long it will take provided there are no outside influences. And that is rarely possible.

However you could go for a solution that attempts to understand and explain these outside influences. The user may find it helpful to be alerted when the speed changes dramatically as they can adjust their plans to fit with the new ETA. It may also be helpful to explain what factors are affecting the current operation. eg

Your download will complete in 6 minutes, if the download speed stays at 50k/s

This allows the user to make some educated guesses if they know that speeds are likely to change. And ultimately leads to less frustrations.

Jack Ryan
+1  A: 

Bram Cohen has talked about this a bit. He has put a lot of effort into the ETA calculations in BitTorrent (yet in a talk he mentioned that no one has yet come up to him and say "hey! great ETA calculations in bittorrent man!"). It's not a simple problem.

Some relevant links:

Thomas
Thanks very much for the links, I didn't notice them in the first time. Now I'm back on a similar task and was curious to have a look again at my question.
Maghis