views:

438

answers:

9

Hi,

I have to calculate the following:

float2 y = CONSTANT;
for (int i = 0; i < totalN; i++)
   h[i] = cos(y*i);

totalN is a large number, so I would like to make this in a more efficient way. Is there any way to improve this? I suspect there is, because, after all, we know what's the result of cos(n), for n=1..N, so maybe there's some theorem that allows me to compute this in a faster way. I would really appreciate any hint.

Thanks in advance,

Federico

A: 

You can do this using complex numbers.

if you define x = sin(y) + i cos(y), cos(y*i) will be the real part of x^i.

You can compute for all i iteratively. Complex multiply is 2 multiplies plus two adds.

jdv
But computing N exponentials will not be faster than computing N cosines. At least there is no real reason why it should.
AVB
@AB: I added that you can compute them iteratively, multiply the current by x. That's one complex multiply per entry required.
jdv
A: 

Knowing cos(n) doesn't help -- your math library already does these kind of trivial things for you.

Knowing that cos((i+1)y)=cos(i*y+y)=cos(i*y)*cos(y)-sin(i*y)*sin(y) can help, if you precompute cos(y) and sin(y), and keep track of both cos(i*y) and sin(i*y) along the way. It may result in some loss of precision, though - you'll have to check.

AVB
+3  A: 

Here's a method, but it uses a little bit of memory for the sin. It uses the trig identities:

cos(a + b) = cos(a)cos(b)-sin(a)sin(b)
sin(a + b) = sin(a)cos(b)+cos(a)sin(b)

Then here's the code:

h[0] = 1.0;
double g1 = sin(y);
double glast = g1;
h[1] = cos(y);
for (int i = 2; i < totalN; i++){
    h[i] = h[i-1]*h[1]-glast*g1;
    glast = glast*h[1]+h[i-1]*g1;

}

If I didn't make any errors then that should do it. Of course there could be round-off problems so be aware of that. I implemented this in Python and it is quite accurate.

Justin Peel
+6  A: 

Using one of the most beautiful formulas of mathematics, Euler's formula
exp(i*x) = cos(x) + i*sin(x),

substituting x := n * phi:

cos(n*phi) = Re( exp(i*n*phi) )
sin(n*phi) = Im( exp(i*n*phi) )

exp(i*n*phi) = exp(i*phi) ^ n

Power ^n is n repeated multiplications. Therefore you can calculate cos(n*phi) and simultaneously sin(n*phi) by repeated complex multiplication by exp(i*phi) starting with (1+i*0).

Code examples:

Python:

from math import *

DEG2RAD = pi/180.0 # conversion factor degrees --> radians
phi = 10*DEG2RAD # constant e.g. 10 degrees

c = cos(phi)+1j*sin(phi) # = exp(1j*phi)
h=1+0j
for i in range(1,10):
  h = h*c
  print "%d %8.3f"%(i,h.real)

or C:

#include <stdio.h>
#include <math.h>

// numer of values to calculate:
#define N 10

// conversion factor degrees --> radians:
#define DEG2RAD (3.14159265/180.0)

// e.g. constant is 10 degrees:
#define PHI (10*DEG2RAD)

typedef struct
{
  double re,im;
} complex_t;


int main(int argc, char **argv)
{
  complex_t c;
  complex_t h[N];
  int index;

  c.re=cos(PHI);
  c.im=sin(PHI);

  h[0].re=1.0;   
  h[0].im=0.0;
  for(index=1; index<N; index++)
  {
    // complex multiplication h[index] = h[index-1] * c;
    h[index].re=h[index-1].re*c.re - h[index-1].im*c.im; 
    h[index].im=h[index-1].re*c.im + h[index-1].im*c.re; 
    printf("%d: %8.3f\n",index,h[index].re);
  }
} 
Curd
Nice solution but what about floating point errors after large number of iterations? They are inevitable here.
Kirk Broadhurst
@Kirk Broadhurst: Try it out: using the Python example I found that even for a large number of iterations (e.g. several 10 thousands) the difference to the values calculated by the built-in cos-function stays in the range 1E-10.
Curd
+6  A: 

I'm not sure what kind of accuracy vs. performance compromises you're willing to make, but there are extensive discussions of various sinusoid approximation techniques at these links:

Fun with Sinusoids - http://www.audiomulch.com/~rossb/code/sinusoids/
Fast and accurate sine/cosine - http://www.devmaster.net/forums/showthread.php?t=5784

Edit (I think this is the "Don Cross" link that's broken on the "Fun with Sinusoids" page):

Optimizing Trig Calculations - http://groovit.disjunkt.com/analog/time-domain/fasttrig.html

datageist
The method in the last section of the last link here looks to be faster than any other answer on here in my opinion.
Justin Peel
+1  A: 

How accurate do you need the resulting cos(x) to be? If you can live with some, you could create a lookup table, sampling the unit circle at 2*PI/N intervals and then interpolate between two adjacent points. N would be chosen to achieve some desired level of accuracy.

What I don't know is whether an interpolation is actually less costly than computing a cosine. Since its usually done in microcode in modern CPUs, it may not be.

Larry
Last time I checked, fcos (asm) took ~35 cycles, accessing uncached memory took several hundred. Time it both ways and see if it's faster (table cached or not)
Arthur Kalliokoski
@Arthur: if you look at the code the OP has, he wants to compute the cos and then update an array `h[i]` and so perhaps keep it for later use. So the real issue is very likely to be interpolation vs computation, as Larry mentions and if interpolation can be avoided (by choosing a large enough N), it could be well be what OP is looking for. This solution also has the advantage of not having floating point accumulation errors.
Moron
+4  A: 

Maybe the simplest formula is

cos(n+y) = 2cos(n)cos(y) - cos(n-y).

If you precompute the constant 2*cos(y) then each value cos(n+y) can be computed from the previous 2 values with one single multiplication and one subtraction. I.e., in pseudocode

h[0] = 1.0
h[1] = cos(y)
m = 2*h[1]
for (int i = 2; i < totalN; ++i)
  h[i] = m*h[i-1] - h[i-2]
Accipitridae
that is beautiful, very nice solution , +1
xxxxxxx
+1 for very nice solution, however there are bound to be floating point errors which will quickly compound after iterations; especially if totalN is large.
Kirk Broadhurst
@Kirk Broadhurst: Yup, you certainly raise an important point here. I did a small test before posting the algorithm above. It appears that the errors accumulate only linearly. (E.g. after a million iterations the error is about 10^{-11} when using double). A better analysis of the error would be necessary. It would also make sense to compute a pair correct cosines after a given number of iterations.
Accipitridae
To bound error, just compute the cosine from scratch every 1000 iterations or so.
Keith Randall
+1  A: 

There are some good answers here but they are all recursive. Recursive calculation will not work for cosine function when using floating point arithmetic; you will invariably get rounding errors which quickly compound.

Consider calculation y = 45 degrees, totalN 10 000. You won't end up with 1 as the final result.

Kirk Broadhurst
+1 You have a good point here. However, using y=45 gives a bad benchmark. At least in a small test, that I did some rounding errors cancel each other out. I.e., with y=45 I get an error of 10^-15 while values for y give a larger error of about 10^-13.
Accipitridae
+1  A: 

To address Kirk's concerns: all of the solutions based on the recurrence for cos and sin boil down to computing

x(k) = R x(k - 1),

where R is the matrix that rotates by y and x(0) is the unit vector (1, 0). If the true result for k - 1 is x'(k - 1) and the true result for k is x'(k), then the error goes from e(k - 1) = x(k - 1) - x'(k - 1) to e(k) = R x(k - 1) - R x'(k - 1) = R e(k - 1) by linearity. Since R is what's called an orthogonal matrix, R e(k - 1) has the same norm as e(k - 1), and the error grows very slowly. (The reason it grows at all is due to round-off; the computer representation of R is in general almost, but not quite orthogonal, so it will be necessary to restart the recurrence using the trig operations from time to time depending on the accuracy required. This is still much, much faster than using the trig ops to compute each value.)

Nice explanation.
Accipitridae