views:

409

answers:

11

I need to find the total timings for the execution of a program over different inputs. The program reads some data and writes it into another file. The values of the data value and the size of the data are different every time.

I want to find how long it will take in general for all size of data.

Is the algorithm for finding this based on the total timings of the program for a single execution?

For Example, if I know

for single execution 
a.program - execution time   1.2sec 
          - its create file  100 kb file

Can I find out how long it will take for n executions, on a different data size?

+8  A: 

I don't quite understand your question, but I believe that what you're asking is how to figure out the execution time of a program in advance of running it.

This is related to the halting problem. The halting problem is intractable.

Apologies if I am misunderstanding your question.

Edit: To respond to your clarification, there is no general algorithm for extrapolating runtime for larger inputs from runtime for smaller inputs. Analysis of algorithms is very tricky business. There are heuristics that you can use. For example, you could compute runtime on inputs of varying "size" (say, 10, 100, 1000, 10000) and try to fit a curve to the function "size" -> runtime.

Jason
+1, although note that for real programs you can usually cut the Gordian knot of the halting program by just running the thing while holding a stopwatch. This method only fails for those programs which don't halt, and in practice if your program is bug free, and was designed to halt, then it halts. Alan Turing can stick his nose out.
Steve Jessop
@onebyone.livejournal.com this also fails when the algorithm doesn't have linear growth (i.e. O(n)). If, for example, the algorithm is O(n^2), the length of time it will take for 1,000 items is 10,000 times longer than the length of time it will take for 10 items.
Chas. Owens
@ Chas. Owens..which is why you choose varying largeness of input size. Start with n = 10. step to n = 100. step to n = 1000. step to n = 10000. step to.... stop when you are convinced of the (non-proven) run-time. Be sure not to stop way too soon... This is why you have to have a good asymptotic analysis of your algorithm to begin with.
San Jacinto
@unknown (google) Just the size of the data set is not enough. Some algorithms have different behaviors based on the data. For instance quick sort has an average case of `O(n log n)`, but a worst case of `O(n^2)`. If you were to run your trial method, you will likely see the `O(n log n)` performance, and then be surprised when it sometimes takes `O(n^2)`. If you want to understand the performance characteristics of your program there are no shortcuts, you must must use the techniques from the Algorithmic Complexity branch of Computer Science.
Chas. Owens
@ Chas. Owens... That's fine, I agree completely. However, there are times when the algorithm is so complex that proving run-time is not a good use of time when all you want is a general indication of how long the thing runs. Much to the chagrin of algorithm designers everywhere, much of the world doesn't have the time in their development schedule for proving run-time when simple tests like this can be surprisingly accurate... if you know the nature of the algorithm, it may be a good fit. If you don't know the nature of it, don't try it. Such was the intention of my last comment.
San Jacinto
+2  A: 

1, Halting problem is intractable, i.e. even you have the data and the program, there is no way to tell (! in the general case !) how long it will take to run it. 2, The 'program' in the halting problem means the complete state, e.g. your program + the data it processes.

So it's two times intractable :)

emvy
+4  A: 

There is no perfect algorithm for this, and it will be highly variable - a first run may be very slow, but a second run will be much quicker since the data from disk is cached in memory.

You could measure your program with a variety of inputs and try to build a model for input size/complexity compared to execution time and then use the results of that to estimate execution time.

Michael
"build a model for input size/complexity compared to execution time" - this sounds hard, and it might be hard. But often it just means running it 3 times each with 10 different-size inputs, quickly drawing a scatter plot, noticing that it's a straight line, and job done.
Steve Jessop
+3  A: 

If I understand you correctly, you want to find out how long it takes a process to execute.

If this is the case then use the Benchmark module.

Using this you can take timings at different places and judge the timings of different parts of your program.

Xetius
+1  A: 

From the question, I am not sure you are trying to calculate execution time before running this program, or to record the time it takes your script to run. If you are trying to pre-calculate, I agree with the other answers that say it is not possible.

If you want to record your execution time, simply add 2 global date variables to your program, store the current date and time in one immediately at the start of execution, and then store the time in the second variable upon termination. Use a date difference function to give you the seconds (or your desired time unit) elapsed.

Bork Blatt
+1  A: 

It's hard to know for sure, but I don't think this is a version of the halting problem. (or, at least, it can be simplified so that it's not).

I think, you're just asking for estimates about how long a series of reads and writes will take...but with varied amounts of reading and writing.

The easiest way to do it is to make some empirical measurements (the more the better) and use those measurements to make estimates about how long future runs will take. If you discover that reading 10 MB of data takes 10 seconds, then you could estimate that 100 MB might take about 100 seconds. (This, of course, makes the assumption that you're looking at an O(n) algorithm...if not, you'll have to adjust accordingly.)

Of course, this will be prone to error, because of whatever else is going on on the system...but, depending on your needs, it may give you estimates that are good enough.

Of course, if you are able to update your estimates as you are reading/writing, you can improve them.

Beska
+2  A: 

If you know the asymptotic running time of your algorithm, for example knowing a sorting algorithm is n*log(n), you can run it on small inputs and calculate (though perhaps only a range) what it would be for larger inputs. The problem is that analyzing most algorithms is very hard. Otherwise, you could run it a few times on smaller input and do some type of regression (probably nonlinear) to discover/approximate an equation for the algorithm's performance characteristics, and use that to calculate for larger inputs.

James M.
+1  A: 

If you are asking about the practical solution, then using the Benchmark module, or some other process where you record the time. Then plot the execution time against the input size for a number of inputs and interpolate (but beware extrapolation, as this xkcd cartoon demonstrates).

If you want to know about the theory you need to understand "computational complexity" which is a good search term to get you started.

For example, if you run over the data once, then usually twice as much data will take roughly twice as long. The best search algorithms typically take O(NlnN) so twice as much data will take slightly more than twice as long. But even these only give you limits on the length of time, and the constants will depend on things like disk access, memory, other progams running, etc.

Nick Fortescue
+1  A: 

You have to know if your program halts. It can't be automatically desired but you can determine if you know its design. Then you have to know at least your program asymptotic complexity. Better if you know real complexity formula. Then you can benchmark for adequate set of inputs. Then you can interpolate your data to achieve constants. Finally just place constant to equation and compute. Easy, isn't it? ;-)

Hynek -Pichi- Vychodil
+1  A: 

If you can reexecute your program. You can time it using unix "time" command. If not, you need to save System time, and then again at end of program and print it out?

+1  A: 

There is an entire branch of Computer Science devoted to this: Algorithmic Complexity. In short, you analyze the code to determine what performance characteristics it has, for instance, most good sorting algorithms have an average runtime of O(n log n) where n is the size of the array to be sorted. This means the length of time to sort 100 items is 100 * ln 100 / ln 2 * x or 664x where x is the amount of time needed to run one instruction.

Chas. Owens