views:

914

answers:

4

My current task is to optimise a Monte Carlo Simulation that calculates Capital Adequacy figures by region for a set of Obligors.

It is running about 10 x too slow for where it will need to be in production and number or daily runs required. Additionally the granularity of the result figures will need to be improved down to desk possibly book level at some stage, the code I've been given is basically a prototype which is used by business units in a semi production capacity.

The application is currently single threaded so I'll need to make it multi-threaded, may look at System.Threading.ThreadPool or the Microsoft Parallel Extensions library but I'm constrained to .NET 2 on the server at this bank so I may have to consider this guy's port, http://www.codeproject.com/KB/cs/aforge_parallel.aspx.

I am trying my best to get them to upgrade to .NET 3.5 SP1 but it's a major exercise in an organisation of this size and might not be possible in my contract time frames.

I've profiled the application using the trial of dotTrace (http://www.jetbrains.com/profiler). What other good profilers exist? Free ones?

A lot of the execution time is spent generating uniform random numbers and then translating this to a normally distributed random number. They are using a C# Mersenne twister implementation. I am not sure where they got it or if it's the best way to go about this (or best implementation) to generate the uniform random numbers. Then this is translated to a normally distributed version for use in the calculation (I haven't delved into the translation code yet).

Also what is the experience using the following?

Any alternatives you know of? I'm a C# developer so would prefer C#, but a wrapper to C++ shouldn't be a problem, should it?

Maybe even faster leveraging the C++ implementations. I am thinking some of these libraries will have the fastest method to directly generate normally distributed random numbers, without the translation step. Also they may have some other functions that will be helpful in the subsequent calculations.

Also the computer this is on is a quad core Opteron 275, 8 GB memory but Windows Server 2003 Enterprise 32 bit. Should I advise them to upgrade to a 64 bit OS? Any links to articles supporting this decision would really be appreciated.

Anyway, any advice and help you may have is really appreciated.

+1  A: 

Have you considered pointing a profiler at your code? I've seen cases where there are simple fixes get very significant improvements. Like switching a couple of properties over to fields.

Jake Pearson
I have tried with the trial of dottrace but the results weren't that granular, I will have a try with RedGate if they have a free trial?
m3ntat
I think they do, Ants has saved the day for me a bunch of times.
Jake Pearson
A: 

Being constrained to use .Net in the first place for a large-scale simulation is going to cost you quite a bit of performance right up front...but that said...

If you're running a pure C# implementation of the Mersenne Twister, it's likely that you'll have a hard time tweaking all the performance you can out of it. If you check out the Mersenne Twister reference implementation you'll see they have a C version that is heavily optimized for SSE-capable processors - this is very fast. I don't believe it's possible in C# (or at least, I'm not aware how) to force the usage of SSE instructions with that level of optimization. I'd suggest writing a C++/CLI wrapper (or a P/Invoke wrapper) around the Mersenne Twister libraries, and seeing how that affects your performance. However, you'll have to be careful with managed-unmanaged marhsalling affecting your performance, as I have seen other posts here on SO about that issue (though I can't seem to find them right now...).

I may generate some flame for saying this, but if performance is a significant concern in your application, well-written C or C++ is almost always going to be preferable to any managed or interpreted language.

Not Sure
I actually disagree fairly strongly with your last statement... C# can perform very, very well. It does require profiling, but in my experience, it's much easier to profile C# and improve it to the point where it can outperform C and C++ - especially if you take advantages of the right libraries, and understand how to write tight, highly performant code in C#.
Reed Copsey
I want to constructively disagree, too :)
peterchen
@Reed - Let me clarify - I'm not talking about the ease of profiling, nor the tools available, nor the difficulty of optimizing. I'm asserting that for any program written in an interpreted or managed language, it can be proven that a functionally equal program with equal or better performance can be written in an unmanaged language.
Not Sure
+3  A: 

I have found the Mersenne Twister to be quick. The problem may be in the algorithm (Box-Muller) to transform the uniform distrubution to Gaussian distribution. The standard algorithm looks like:

y1 = sqrt( - 2 ln(x1) ) cos( 2 pi x2 )
y2 = sqrt( - 2 ln(x1) ) sin( 2 pi x2 )

Where x1 and x2 are uniform random numbers and y1 and y2 are the gaussian distribution outputs.

The square roots are slow, but the trig is worse, and it is unstable close to 0. Taygeta's page on the subject gives a faster one (in pseudocode):

         float x1, x2, w, y1, y2;

     do {
             x1 = 2.0 * ranf() - 1.0;
             x2 = 2.0 * ranf() - 1.0;
             w = x1 * x1 + x2 * x2;
     } while ( w >= 1.0 );

     w = sqrt( (-2.0 * ln( w ) ) / w );
     y1 = x1 * w;
     y2 = x2 * w;

If they're not using something like this, you may be able to speed things up quite a bit by avoiding the trig functions or even pre-generating the random numbers.

R Ubben
As a note, many modern processors have an assembly instruction for simultaneous calculation of sin and cos, and it is much cheaper than calling both sequentially. It's not available in any standard libraries, afaik, since it is a processor-specific feature.
Not Sure
Thanks @R Ubben, is your proposal the same as this http://en.wikipedia.org/wiki/Box-Muller_transformation or is this something different?
m3ntat
Yes, the polar form they describe. It is rejection sampling, so you throw some numbers away, but it still ends up much faster. Though I also work in banking, I was doing this for enjoyment - global illumination in a ray tracer. It did make a difference. If speed is still a problem, you could generate several hundred million offline between daily runs, depending on how many the run uses, and just read them in as needed. Then fall back to generation if the store is exhausted.
R Ubben
A: 

My experience is that the relative performance of C# vs. C++ is largely dependent on what you're doing. A great discussion of that here:

http://stackoverflow.com/questions/145110/c-performance-vs-java-c

For tight loops doing math (say vector physics calculations) c++ is a 2-3 times faster than C# although the perf may be dominated by the underlying functions like Sqrt().

I've taken a mixed language approach, (re)implementing the slowest code in C++/OpenMP with a managed C++/CLI wrapper. This allows you to only "pay for what you use".

There's a summary of how to wrap native C/C++ with C++/CLI here:

http://msdn.microsoft.com/en-us/library/ms235281.aspx

Once you get the hang of C++/CLI it's pretty easy to get things running.

Ade Miller