views:

197

answers:

6

How do you determine what the cycle cost is for a given C library function?

Specifically, how expensive is it to call sqrt()?

I should clarify here, I'm just looking for a general idea of the cost associated with calling certain C library functions. I'm assuming that professional C programmers just have a basic idea of function cost for these things. For example, sqrt() might be understood to degrade performance to O(n^2).

A: 

Remember that you can benchmark these functions. This means running it a million (or whatever) times and timing how long it takes.

Alexander Rafferty
How? What's the best way to lock into the actual time (besides holding a stopwatch in my hand?) Can you give me some code that will do this for me? But more to the point, I don't care so much about actual time cost. What I'm interested in is cycle counts - how expensive is a call like this? How do you determine expense?
dvanaria
Try looking into the time.h header file. Note: I do not know if it is a C or C++ header.
Alexander Rafferty
see this [page](http://www.cs.cf.ac.uk/Dave/C/node21.html) (Example 1).
Femaref
+1  A: 

I'm afraid, that the days, when you could determine the cost of a function call in clock cycles, are long gone (for most platforms).

It might still be possible for some simple microcontrollers, but desktop processors are way too advanced for that.

The cost is dependent on many factors. The processor has ability to reorder instructions and execute instructions of your function in parallel and also in parallel with other instructions, even from a different thread (if the core supports simultaneous multithreading). Targets of branches are "guessed" and executed before the condition is fully evaluated. If the "guess" happens to be wrong, computations are discarded. The guess is based on some statistics gathered by the processor. The difference in execution time in case the code is in cache and in case it isn't can be enormous. Even if you were able to determine, that the execution of the function took x cycles in some case, the next time you run the code you can get y and y might be very different than x.

If you have a simple microcontroller and the documentation says explicitly how many clock cycles each instruction takes, you can take disassembly of your function and add costs of all instructions it consists of.

Maciej Hehl
True enough, but at the same time, I think it's missing the point. Basically, the OP wants to know "how guilty do I have to feel if I'm using sqrt inside an inner loop in my code?", and THAT we can give a reasonable answer to with naive methods.
leo grrr
@leo grrr Sure, but I think making the OP realise, why the question about cycle cost can't get a precise answer has some value too. I'm tired of seeing "measure" everywhere, so I decided not to write it once more :)
Maciej Hehl
Based on other comments the OP has left on some answers, I agree that you have correctly identified a need for information!
leo grrr
+2  A: 

With good inlining compiler, the cost of calling sqrt() is one coprocessor instruction: FSQRT. Cycles are indeed hard to determine with superscalar architectures.

alxx
+1  A: 

Square root converges quadratically when it does converge. The number of iterations depends on the initial guess used by the Newton-Raphson algorithm used to calculate it.

Here's a nice page that says it costs four iterations for 32 bits for four-digit accuracy and five iterations for 64 bits.

This is how it's done for situations where it's not done in hardware.

duffymo
+2  A: 

If you don't need anything really fancy, which you don't , you can just use localtime() in a loop.

There's actually code for exactly what you want right here! http://beige.ucs.indiana.edu/B673/node104.html

The basic idea is:

#include <time.h>
...
int main(){
    time_t t0, t1;
    t0 = time(NULL);
    for (int i=1; i <= huge number; i++) //to hopefully make up for statistical noise
        sqrt(i)
    t1 = time(NULL);
    // do the math and print
}
leo grrr
A smart compiler will remove the `sqrt(i)` call altogether, you should do something with the result. see here in a question asked earlier today http://stackoverflow.com/questions/3793410/benchmarking-math-h-square-root-and-quake-square-root
st0le
Actually, I just build it with `gcc -O2 -S` and it bottomed out in a fsqrt assembly call. Maybe it's not smart enough? Actually, since the compiler doesn't know what happens inside a function--it might have side effects--I don't expect it would remove that call. If I just did some math and then didn't do anything with it, I agree!
leo grrr
+3  A: 

First, you need to clearly define what you mean by "cycle cost". This could be interpreted as either algorithmic cycles or processor cycles. Once you have done that, the next step is detarmining the zen of your approach - theoretical or test.

First, theoretical.

If it is algorithmic cycles, I would make this determination via a code review. The steps would be

  1. Determine the c library that you are using
  2. Obtain the source code for that specific sqrt library implementation
  3. Perform analysis of the source code to estimate complexity (cycle cost of algorithm)

But as others have posted, most instruction sets in use today will have a sqrt instruction, so I'd expect the source for the c implementation to just call this instruction - algorithmic cycles are expected to be constant.

If it is processor cycles, you would make this determination in a similar fashion, but some of the details would change:

  1. Determine machine code that will run on the processor by doing 1, 2, and 3 above
  2. Learn more about processors than I know
  3. Probably realize this is futile and just test it (Other SO useful answer)

For a test approach you want to:

  1. Determine what all environments you want this information for
    1. Processors
    2. Operating systems
  2. Author a test application
    1. Should perform all operations you want to test including anything outside of sqrt you need this information for (such as casting inputs, outputs)
    2. Should perform a set number of iterations over a long period of time
    3. Should perform a single instruction and exit and be called a set number of iterations of a long period of time
    4. Really need to just give it thought, identify why you want this information and how you will use it, and come up with a test app that removes as many variables as possible.
  3. Run application
    1. Run in as close to the environment you care about
    2. If you the end use of this information will be performance predictions for users - test in a representative environment (if there may be running 10 different web browser windows while watching a netflix movie stream, test under those conditions)
  4. Do some sort of analysis on the data set to get a reasonable prediction.
PatrickV