views:

847

answers:

8

Hi,

I have the following code doing Sin/Cos function using a pre-calculated memory table. in the following example the table has 1024*128 items covering all the Sin/Cos values from 0 to 2pi. I know I can use Sin/Cos symmetry and hold only 1/4 of the values but them I will have more 'ifs' when computing the value.

private const double PI2 = Math.PI * 2.0; 
private const int TABLE_SIZE = 1024 * 128;
private const double TABLE_SIZE_D = (double)TABLE_SIZE;
private const double FACTOR = TABLE_SIZE_D / PI2;

private static double[] _CosineDoubleTable;
private static double[] _SineDoubleTable;

Set the translation table

private static void InitializeTrigonometricTables(){
   _CosineDoubleTable = new double[TABLE_SIZE];
   _SineDoubleTable = new double[TABLE_SIZE];

   for (int i = 0; i < TABLE_SIZE; i++){
      double Angle = ((double)i / TABLE_SIZE_D) * PI2;
      _SineDoubleTable[i] = Math.Sin(Angle);
      _CosineDoubleTable[i] = Math.Cos(Angle);
   }
}

The Value is a double in radians.

Value %= PI2;  // In case that the angle is larger than 2pi
if (Value < 0) Value += PI2; // in case that the angle is negative
int index = (int)(Value * FACTOR); //from radians to index and casted in to an int
double sineValue = _SineDoubleTable[index]; // get the value from the table

I'm looking for a faster way to do this. The above 4 lines are ~25% of the whole process (executed billions of times).

+3  A: 

If you have to compute it so many times,

  1. Use a processor-specific math library like the IKML or ACML and
    1. Compute the values in groups (vectors).
    2. When you need both, always compute the sin and cos of a value at the same time.
  2. Check your algorithm complexity and implementation design.
  3. Make sure you're using all the processor has to offer - x64 architecture, plus any vector instructions that would help.
280Z28
A: 

That's going to be pretty damn fast as it is.

If you really need to squeeze every imaginable drop of performance out of this code, you may want to consider writing this part of it (including the outer loop that loops billions of times) in a C++ dll (or even ASM). Make sure your compiler is set to allow the largest possible instruction set available to you.

[Edit] I missed how large the tables are - this could very well slow down your code significantly due to cache misses. Have you tried benchmarking it against Math.Cos(), or other methods of approximating trig functions (you can get very good approximations with a few simple multiplications using Taylor Series)

BlueRaja - Danny Pflughoeft
I thought about that and tried smaller tables but 128x1024 is pretty much the break-even point. Smaller tables won't run faster but larger tables start showing slowdowns. I'm running it on an Intel 8200 Quad
Gilad
+1  A: 

This looks pretty good, except for the mod operation. Can you do without it?

If the values are near zero, you can use

while(Value > PI2) Value -= PI2;
while(Value < 0) Value += PI2;

Or it may be faster to cast the index to an integer (possibly out of range) first, then mod that as as integer. If the table size is going to be a multiple of 2, you can even use bit operations (if the compiler doesn't do this already).

UncleO
I can't drop the mod operation,it is a must. Your idea looks interesting, I will give it a try tomorrow although I'm skeptical that a single mod operation is cheaper than multiple plus/minus, then again, it depends on the times it is needed where I'm not so sure.
Gilad
Though the single subtraction was a decent idea if `Value` is known to be near-0, a single mod is *significantly* faster than a while loop.
BlueRaja - Danny Pflughoeft
Maybe you could do the mod operation after multiplying with the factor. mod 1024*128 should be a faster, because it can be translated to a bitwise and instruction.
nikie
+1  A: 

No guarantee that it'll do a lot of good, but depending on your processor, integer math is often faster than floating point math. That being the case, I'd probably rearrange the first three lines to first calculate an integer, then reduce its range (if necessary). Of course, as BlueRaja pointed out, using C++ will almost certainly help as well. Using assembly language probably won't do much good though -- for a table lookup like this, a C++ compiler can usually produce pretty good code.

If possible, I'd also look very hard at your accuracy requirements -- without knowing what you're doing with the values, it's hard to say, but for a lot of purposes, your table size and the precision you're storing are far beyond what's necessary or even close to useful.

Finally, I'd note that it's worth at least looking into whether this whole strategy is worthwhile at all. At one time, there was no question that using tables to avoid complex calculations was a solid strategy. Processors have sped up a lot faster than memory though -- to the point that such a table lookup is often a net loss nowadays. In fact, nearly the only way the table stands a chance at all is if it's small enough to fit in the processor cache.

Jerry Coffin
A: 

One thing you could try would be to use the fact that cos(x) = sin(x + pi/2). And make the sine table one quarter larger, so you can reuse it as the cosine table starting one quarter in. Not sure if C# allows you to get a pointer to the middle of the table, as C would. But even if not, the decreased cache usage might be worth more than the added time for the offset into the sine table.

That, is, expressed with C:

double* _CosineDoubleTable = &_SineDoubleTable[TABLESIZE / 4];
Sami
+5  A: 

I'm assuming Taylor expansions are no use for you. So if you want to use a table: You only need one table half as big.

  1. cos(x) = sin(pi/2-x).
  2. sin(pi + x) = -sin(x)

You can make your's code non-branching. Convert first to int format.

int index = (int)(Value * FACTOR);
index %= TABLE_SIZE; // one instuction (mask)
index = (index >= 0) ? index :TABLE_SIZE-index; // one instruction isel
double sineValue = _SineDoubleTable[index];

Compare with Math.Sin anyway. Profile Profile Priofile. (Cache miss might slow down your code in real examples.)

Charles Beattie
+1 for all the best advice in one post.
BlueRaja - Danny Pflughoeft
+12  A: 

You could try to use unsafe code to eliminate array bounds checking.
But even a unsafe, optimized version does not seem to come anywhere near Math.Sin.

Results based on 1'000'000'000 iterations with random values:

(1) 00:00:57.3382769  // original version
(2) 00:00:31.9445928  // optimized version
(3) 00:00:21.3566399  // Math.Sin

Code:

static double SinOriginal(double Value)
{
    Value %= PI2;
    if (Value < 0) Value += PI2;
    int index = (int)(Value * FACTOR);
    return _SineDoubleTable[index];
}

static unsafe double SinOptimized(double* SineDoubleTable, double Value)
{
    int index = (int)(Value * FACTOR) % TABLE_SIZE;
    return (index < 0) ? SineDoubleTable[index + TABLE_SIZE]
                       : SineDoubleTable[index];
}

Test program:

InitializeTrigonometricTables();
Random random = new Random();

SinOriginal(random.NextDouble());
var sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    SinOriginal(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(1) {0}  // original version", sw.Elapsed);

fixed (double* SineDoubleTable = _SineDoubleTable)
{
    SinOptimized(SineDoubleTable, random.NextDouble());
    sw = System.Diagnostics.Stopwatch.StartNew();
    for (long i = 0; i < 1000000000L; i++)
    {
        SinOptimized(SineDoubleTable, random.NextDouble());
    }
    sw.Stop();
    Console.WriteLine("(2) {0}  // optimized version", sw.Elapsed);
}

Math.Sin(random.NextDouble());
sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    Math.Sin(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(3) {0}  // Math.Sin", sw.Elapsed);
dtb
+1 from me even if this isn't the fastest (though I can't wait to test it tomorrow)
Gilad
+1 from me :) - Can you compare to math.sin? and do a sin(x) then a cos (x) to get some cache thrash :)
Charles Beattie
+1 for only one bold enough to write a benchmark
BlueRaja - Danny Pflughoeft
+1 premature optimization , root of all evil , etc etc. Good lesson here
pm100
+1 from me too. Lookup tables never really work on modern processors - random access to memory is a significant bottleneck these days. The only real way to improve this is to use vector processing (SIMD / SSE) but that does depend on the algorithm being used.
Skizz
+2  A: 

There are some great notes on fast computation of sine and cosine here: http://www.research.scea.com/gdc2003/fast-math-functions.html

He covers how to map the input values onto the desired range, and also using mini-max polynomials (minimizing the maximum error over the interval, which is different from Taylor series), and even SIMD optimizations.

celion