views:

177

answers:

3

I currently have an application which can contain 100s of user defined formulae. Currently, I use reverse polish notation to perform the calculations (pushing values and variables on to a stack, then popping them off the stack and evaluating). What would be the best way to start parallelizing this process? Should I be looking at a functional language?

The calculations are performed on arrays of numbers so for example a simple A+B could actually mean 100s of additions. I'm currently using Delphi, but this is not a requirement going forward. I'll use the tool most suited to the job. Formulae may also be dependent on each other So we may have one formula C=A+B and a second one D=C+A for example.

A: 

Could you please elaborate a bit more on your question? Is there a dependency between the 100-dreds of functions you want to process?

Can you give examples of some of the functions performed?

What language/environment are you using right now? Why are you thinking of a functional language, because your environment supports it or because you think functional languages are good on recursion?

You might want to use a language that does concurrency well, which is something different than recursion.

Roalt
+1  A: 

Without knowing more, I woiuld suggest taking a SIMD style approach if possible. That is, create threads to compute all formulas for a single data set. Trying to divide the computation of formulas to parallelise them wouldn't yield much speed improvement as the logic required to be able to split up the computations into discreet units suitable for threading would be hard to write and harder to get right, the overhead would cancel out any speed gains. It would also suffer quickly from diminishing returns.

Now, if you've got a set of formulas that are applied to many sets of data then the parallelisation becomes easier and would scale better. Each thread does all computations for one set of data. Create one thread per CPU core and set its affinity to each core. Each thread instantiates one instance of the formula evalutation code. Create a supervisor which loads a single data set and passes it an idle thread. If no threads are idle, wait for the first thread to finish processing its data. When all data sets are processed and all threads have finished, then exit. Using this method, there's no advantage to having more threads than there are cores on the CPU as thread switching is slow and will have a negative effect on overall speed.

If you've only got one data set then it is not a trivial task. It would require parsing the evaluation tree for branches without dependancies on other branches and farming those branches to separate threads running on each core and waiting for the results. You then get problems synchronising the data and ensuring data coherency.

Skizz

Skizz
+1  A: 

Let's assume your formulae (equations) are not cyclic, as otherwise you cannot "just" evaluate them. If you have vectorized equations like A = B + C where A, B and C are arrays, let's conceptually split them into equations on the components, so that if the array size is 5, this equation is split into

a1 = b1 + c1
a2 = b2 + c2
...
a5 = b5 + c5

Now assuming this, you have a large set of equations on simple quantities (whether integer, rational or something else).

If you have two equations E and F, let's say that F depends_on E if the right-hand side of F mentions the left-hand side of E, for example

E: a = b + c
F: q = 2*a + y

Now to get towards how to calculate this, you could always use randomized iteration to solve this (this is just an intermediate step in the explanation), following this algorithm:

1 while (there is at least one equation which has not been computed yet)
2   select one such pending equation E so that:
3     for every equation D such that E depends_on D:
4       D has been already computed
5   calculate the left-hand side of E

This process terminates with the correct answer regardless on how you make your selections on line // 2. Now the cool thing is that it also parallelizes easily. You can run it in an arbitrary number of threads! What you need is a concurrency-safe queue which holds those equations whose prerequisites (those the equations depend on) have been computed but which have not been computed themselves yet. Every thread pops out (thread-safely) one equation from this queue at a time, calculates the answer, and then checks if there are now new equations so that all their prerequisites have been computed, and then adds those equations (thread-safely) to the work queue. Done.

antti.huima
No cyclic equations. Dependencies can easily be worked out. So sounds good to me
Steve

related questions