ansaurus

Question

Fast Sin/Cos using a pre computed translation array

Answer 1

+3 A:

If you have to compute it so many times,

Use a processor-specific math library like the IKML or ACML and
1. Compute the values in groups (vectors).
2. When you need both, always compute the sin and cos of a value at the same time.
Check your algorithm complexity and implementation design.
Make sure you're using all the processor has to offer - x64 architecture, plus any vector instructions that would help.

280Z28 2010-01-18 18:24:28

Answer 2

A:

That's going to be pretty damn fast as it is.

If you really need to squeeze every imaginable drop of performance out of this code, you may want to consider writing this part of it (including the outer loop that loops billions of times) in a C++ dll (or even ASM). Make sure your compiler is set to allow the largest possible instruction set available to you.

[Edit] I missed how large the tables are - this could very well slow down your code significantly due to cache misses. Have you tried benchmarking it against Math.Cos(), or other methods of approximating trig functions (you can get very good approximations with a few simple multiplications using Taylor Series)

BlueRaja - Danny Pflughoeft 2010-01-18 18:26:11

I thought about that and tried smaller tables but 128x1024 is pretty much the break-even point. Smaller tables won't run faster but larger tables start showing slowdowns. I'm running it on an Intel 8200 Quad

Gilad 2010-01-18 19:13:19

Answer 3

+1 A:

This looks pretty good, except for the mod operation. Can you do without it?

If the values are near zero, you can use

while(Value > PI2) Value -= PI2;
while(Value < 0) Value += PI2;

Or it may be faster to cast the index to an integer (possibly out of range) first, then mod that as as integer. If the table size is going to be a multiple of 2, you can even use bit operations (if the compiler doesn't do this already).

UncleO 2010-01-18 18:26:33

I can't drop the mod operation,it is a must. Your idea looks interesting, I will give it a try tomorrow although I'm skeptical that a single mod operation is cheaper than multiple plus/minus, then again, it depends on the times it is needed where I'm not so sure.

Gilad 2010-01-18 18:32:17

Though the single subtraction was a decent idea if `Value` is known to be near-0, a single mod is *significantly* faster than a while loop.

BlueRaja - Danny Pflughoeft 2010-01-18 18:47:18

Maybe you could do the mod operation after multiplying with the factor. mod 1024*128 should be a faster, because it can be translated to a bitwise and instruction.

nikie 2010-01-18 18:56:11

Answer 4

+1 A:

No guarantee that it'll do a lot of good, but depending on your processor, integer math is often faster than floating point math. That being the case, I'd probably rearrange the first three lines to first calculate an integer, then reduce its range (if necessary). Of course, as BlueRaja pointed out, using C++ will almost certainly help as well. Using assembly language probably won't do much good though -- for a table lookup like this, a C++ compiler can usually produce pretty good code.

If possible, I'd also look very hard at your accuracy requirements -- without knowing what you're doing with the values, it's hard to say, but for a lot of purposes, your table size and the precision you're storing are far beyond what's necessary or even close to useful.

Finally, I'd note that it's worth at least looking into whether this whole strategy is worthwhile at all. At one time, there was no question that using tables to avoid complex calculations was a solid strategy. Processors have sped up a lot faster than memory though -- to the point that such a table lookup is often a net loss nowadays. In fact, nearly the only way the table stands a chance at all is if it's small enough to fit in the processor cache.

Jerry Coffin 2010-01-18 18:34:27

Answer 5

A:

One thing you could try would be to use the fact that cos(x) = sin(x + pi/2). And make the sine table one quarter larger, so you can reuse it as the cosine table starting one quarter in. Not sure if C# allows you to get a pointer to the middle of the table, as C would. But even if not, the decreased cache usage might be worth more than the added time for the offset into the sine table.

That, is, expressed with C:

double* _CosineDoubleTable = &_SineDoubleTable[TABLESIZE / 4];

Sami 2010-01-18 18:52:12

Answer 6

+5 A:

I'm assuming Taylor expansions are no use for you. So if you want to use a table: You only need one table half as big.

cos(x) = sin(pi/2-x).
sin(pi + x) = -sin(x)

You can make your's code non-branching. Convert first to int format.

int index = (int)(Value * FACTOR);
index %= TABLE_SIZE; // one instuction (mask)
index = (index >= 0) ? index :TABLE_SIZE-index; // one instruction isel
double sineValue = _SineDoubleTable[index];

Compare with Math.Sin anyway. Profile Profile Priofile. (Cache miss might slow down your code in real examples.)

Charles Beattie 2010-01-18 18:52:21

+1 for all the best advice in one post.

BlueRaja - Danny Pflughoeft 2010-01-18 19:01:22

Answer 7

+12 A:

You could try to use unsafe code to eliminate array bounds checking.
But even a unsafe, optimized version does not seem to come anywhere near Math.Sin.

Results based on 1'000'000'000 iterations with random values:

(1) 00:00:57.3382769  // original version
(2) 00:00:31.9445928  // optimized version
(3) 00:00:21.3566399  // Math.Sin

Code:

static double SinOriginal(double Value)
{
    Value %= PI2;
    if (Value < 0) Value += PI2;
    int index = (int)(Value * FACTOR);
    return _SineDoubleTable[index];
}

static unsafe double SinOptimized(double* SineDoubleTable, double Value)
{
    int index = (int)(Value * FACTOR) % TABLE_SIZE;
    return (index < 0) ? SineDoubleTable[index + TABLE_SIZE]
                       : SineDoubleTable[index];
}

Test program:

InitializeTrigonometricTables();
Random random = new Random();

SinOriginal(random.NextDouble());
var sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    SinOriginal(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(1) {0}  // original version", sw.Elapsed);

fixed (double* SineDoubleTable = _SineDoubleTable)
{
    SinOptimized(SineDoubleTable, random.NextDouble());
    sw = System.Diagnostics.Stopwatch.StartNew();
    for (long i = 0; i < 1000000000L; i++)
    {
        SinOptimized(SineDoubleTable, random.NextDouble());
    }
    sw.Stop();
    Console.WriteLine("(2) {0}  // optimized version", sw.Elapsed);
}

Math.Sin(random.NextDouble());
sw = System.Diagnostics.Stopwatch.StartNew();
for (long i = 0; i < 1000000000L; i++)
{
    Math.Sin(random.NextDouble());
}
sw.Stop();
Console.WriteLine("(3) {0}  // Math.Sin", sw.Elapsed);

dtb 2010-01-18 19:17:12

+1 from me even if this isn't the fastest (though I can't wait to test it tomorrow)

Gilad 2010-01-18 19:21:57

+1 from me :) - Can you compare to math.sin? and do a sin(x) then a cos (x) to get some cache thrash :)

Charles Beattie 2010-01-18 19:33:08

+1 for only one bold enough to write a benchmark

BlueRaja - Danny Pflughoeft 2010-01-18 21:27:12

+1 premature optimization , root of all evil , etc etc. Good lesson here

pm100 2010-01-18 21:54:08

+1 from me too. Lookup tables never really work on modern processors - random access to memory is a significant bottleneck these days. The only real way to improve this is to use vector processing (SIMD / SSE) but that does depend on the algorithm being used.

Skizz 2010-01-18 23:04:34

Answer 8

+2 A:

There are some great notes on fast computation of sine and cosine here: http://www.research.scea.com/gdc2003/fast-math-functions.html

He covers how to map the input values onto the desired range, and also using mini-max polynomials (minimizing the maximum error over the interval, which is different from Taylor series), and even SIMD optimizations.

celion 2010-01-18 22:42:25

ansaurus

tags:

views:

answers:

Fast Sin/Cos using a pre computed translation array

related questions