views:

74

answers:

3

Hi All

I just have a quick question, on how to speed up calculations of infinite series. This is just one of the examples: arctan(x) = x - x^3/3 + x^5/5 - x^7/7 + ....

Lets say you have some library which allow you to work with big numbers, then first obvious solution would be to start adding/subtracting each element of the sequence until you reach some target N.

You also can pre-save X^n so for each next element instead of calculating x^(n+2) you can do lastX*(x^2)

But over all it seems to be very sequential task, and what can you do to utilize multiple processors (8+)??.

Thanks a lot!

EDIT: I will need to calculate something from 100k to 1m iterations. This is c++ based application, but I am looking for abstract solution, so it shouldn't matter. Thanks for reply.

+6  A: 

You need to break the problem down to match the number of processors or threads you have. In your case you could have for example one processor working on the even terms and another working on the odd terms. Instead of precalculating x^2 and using lastX*(x^2), you use lastX*(x^4) to skip every other term. To use 8 processors, multiply the previous term by x^16 to skip 8 terms.

P.S. Most of the time when presented with a problem like this, it's worthwhile to look for a more efficient way of calculating the result. Better algorithms beat more horsepower most of the time.

Mark Ransom
+1  A: 

Well, for this example, you might sum the series (if I've got the brackets in the right places):

(-1)^i * (x^(2i + 1))/(2i + 1)

Then on processor 1 of 8 compute the sum of the terms for i = 1, 9, 17, 25, ...

Then on processor 2 of 8 compute the sum of the terms for i = 2, 11, 18, 26, ...

and so on, finally adding up the partial sums.

Or, you could do as you (nearly) suggest, give i = 1..16 (say) to processor 1, i = 17..32 to processor 2 and so on, and they can compute each successive power of x from the previous one. If you want more than 8x16 elements in the series, then assign more to each processor in the first place.

I doubt whether, for this example, it is worth parallelising at all, I suspect that you will get to double-precision accuracy on 1 processor while the parallel threads are still waking up; but that's just a guess for this example, and you can probably many series for which parallelisation is worth the effort.

And, as @Mark Ransom has already said, a better algorithm ought to beat brute-force and a lot of processors every time.

High Performance Mark
+2  A: 

If you're trying to calculate the value of pi to millions of places or something, you first want to pay close attention to choosing a series that converges quickly, and which is amenable to parallellization. Then, if you have enough digits, it will eventually become cost-effective to split them across multiple processors; you will have to find or write a bignum library that can do this.

Note that you can factor out the variables in various ways; e.g.:

atan(x)= x - x^3/3 + x^5/5 - x^7/7 + x^9/9 ...
       = x*(1 - x^2*(1/3 - x^2*(1/5 - x^2*(1/7 - x^2*(1/9 ...

Although the second line is more efficient than a naive implementation of the first line, the latter calculation still has a linear chain of dependencies from beginning to end. You can improve your parallellism by combining terms in pairs:

       = x*(1-x^2/3) + x^3*(1/5-x^2/7) + x^5*(1/9 ...
       = x*( (1-x^2/3) + x^2*((1/5-x^2/7) + x^2*(1/9 ...
       = [yet more recursive computation...]

However, this speedup is not as simple as you might think, since the time taken by each computation depends on the precision needed to hold it. In designing your algorithm, you need to take this into account; also, your algebra is intimately involved; i.e., for the above case, you'll get infinitely repeating fractions if you do regular divisions by your constant numbers, so you need to figure some way to deal with that, one way or another.

comingstorm