Hi Experts,
I'm interested in any conventional wisdom how to approach the following problem. Note that I'm a hardware guy, so be careful using industry knowledge/terminology/acronyms.
I'm providing an online application that includes very complex math computations, such as fast-Fourier transforms, that involve nested for-loops and very large data arrays (1.6GB each). Users on the internet will access this application, enter some custom parameters, and submit a job that calls these math computations. To keep the user's wait to a minimum, and allow multiple independent sessions for multiple simultaneous users (each user having a separate thread), I'm wondering how I can speed up the math computations, which I anticipate will the a bottleneck.
I'm not so much looking for advice in how to structure the program (e.g. use integer data types whenever possible instead of floating, use smaller arrays, etc.), but rather I'm interested, once the program is complete, what can be done further to speed things up.
For example, how to ensure multiple cores in the CPU are automatically accessed based on demand? (is this done by default or do I need to manage the process somehow?
Or, how to do parallel processing (breaking for-loop up among multiple cores and/or machines)?
Any practical advice is greatly appreciated. I'm sure I'm not the first to need this, so I'm hoping there are industry best practice approaches available that scale with demand.
Thanks in advance!