views:

59

answers:

2

Hi Experts,

I'm interested in any conventional wisdom how to approach the following problem. Note that I'm a hardware guy, so be careful using industry knowledge/terminology/acronyms.

I'm providing an online application that includes very complex math computations, such as fast-Fourier transforms, that involve nested for-loops and very large data arrays (1.6GB each). Users on the internet will access this application, enter some custom parameters, and submit a job that calls these math computations. To keep the user's wait to a minimum, and allow multiple independent sessions for multiple simultaneous users (each user having a separate thread), I'm wondering how I can speed up the math computations, which I anticipate will the a bottleneck.

I'm not so much looking for advice in how to structure the program (e.g. use integer data types whenever possible instead of floating, use smaller arrays, etc.), but rather I'm interested, once the program is complete, what can be done further to speed things up.

For example, how to ensure multiple cores in the CPU are automatically accessed based on demand? (is this done by default or do I need to manage the process somehow?

Or, how to do parallel processing (breaking for-loop up among multiple cores and/or machines)?

Any practical advice is greatly appreciated. I'm sure I'm not the first to need this, so I'm hoping there are industry best practice approaches available that scale with demand.

Thanks in advance!

+1  A: 

You don't state what you setup is (java, php, .net Will you be hosting the system or will it be hosted somewhere) so this is just some off the cuff thoughts:

As far as I know most modern systems that you are likely to be using will spread jobs over the processor cores available.

Spreading the workload over a number of servers can be done relatively easily by load balancing http://www.loadbalancing.org/

You could also look at "cloud computing", where your application would be hosted by somebody like amazon and you pay for what you use (more or less)

http://aws.amazon.com/ec2/

Other providers are available.

I fairly sure if you provide more details you'll get some more specific answers.

Jaydee
Hi Jaydee, the setup is TBD, but most likely PHP (there seem to be many programmers out there) and a Linux machine (more reliable than PC). Hoping PHP can interface somehow to C programs for backend processing. However, there is some flexibility there to change if for a good reason. Hosting will be outsourced. I think load balancing refers to sending different sessions to different computers, but in my case each session could use a lot of computations that I'm hoping can be distributed somehow (like next "for-loop" math computations). Let me know if there's any other details I can provide.
ggkmath
How many concurrent users are you expecting, if it is a lot then there may be little difference between spreading users and spreading calculations.
Jaydee
Good question. The number of users will increase as business increases, so a scalable solution would be great. I'd guess-timate anywhere from 5-30 users simultaneously accessing the routines (worst-case), but this is truly a guess. I REALLY want to avoid having the machine start using virtual memory, which will slow a user's process to a crawl and likely cause him/her to exit in frustration.
ggkmath
+4  A: 

FFT methods are highly parallelizable. Especially in multi dimension.

Classical implementations are FFTW and Intel MKL.

One approach (depending on available hardware) is a pool of worker threads (or processes, depending on configuration).

At my job, we have much success with a pool of PCs and as-simple-as-possible data packets, which get queued, computed (in multicore) by one PC, and sent back to user.

Don't try to micro-optimize the math stuff, use instead one of the above libraries. Focus on designing the packets, queuing the computations (don't forget some kind of quota/priorities), making sure computed data is reliably sent back to the thread which has to do the joins on the packets.

Depending on the hardware (enormous SMP computer or PC farms), the problems are different.

(If you have the choice, go for PC farms.)

Edit: You may want to consider OpenMP to automatically paralellize loops. As for PC farms, they offer advantages over big calculators from a flexibility point of view: they scale well, they are not that expensive, and can be bought/sold/reused efficiently. Linux is probably a good choice, but it depends on which environment you're comfortable with.

Sadly, I must say there are no (to my knowledge) good libraries to distribute reliably and efficiently computational requests on PC farms. The problem is quite hard, since you must account for breakdowns, network communication, congestion, distributing processes...

Alexandre C.
Thanks Alexandre, Yes FFTW would be my choice, but there are other complex math and nested "for-loops" I require (believe me, I wouldn't want to try to optimize an FFT routine). I thinking to go with Linux OS instead of PC for better reliability. Why do you prefer PC farms over SMP computer? Also, when you talk about queuing, computing, and sending data packets, is there any software available written for this type of application, or must one write as custom code? Could you describe more the guts of this process? Thanks!
ggkmath
Is MPI (message parsing interface) a viable option?
ggkmath
@ggkmath: It depends. If you want to dispatch one computation over multiple machines, it is. If you manage to split your computation into little packets which will get computed independantly by the different machines, it is not.
Alexandre C.